View ST20-C1_1317330.PDF datasheet online --- IC-ON-LINE

Datasheet File OCR Text:

ST20-C1 Core Instruction Set Reference Manual
(R)
72-TRN-274-01
July 1997
1/205
Contents
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 1.1 ST20-C1 features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 1.2 Manual structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 2.1 2.2 2.3 2.4 3 3.1 3.2 3.3 3.4 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 Instruction listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Instruction definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Operators used in the definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 Data structures and constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20 Instruction encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24 Manipulating the evaluation stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30 Loading and storing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31 Expression evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35 Forming addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41 Comparisons and jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43 Evaluation of boolean expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45 Bitwise logic and bit operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47 Shifting and byte swapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48 Function and procedure calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48 Peripherals and I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52 Status register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55 Data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 mac and umac . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 Short multiply accumulate loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 Biquad IIR filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58 Data vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59 Data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
Using ST20-C1 instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
Multiply accumulate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56
2/205
(R)
Contents 6 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 7 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 8 Exception levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71 Exception vector table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72 Exception control block and the saved state. . . . . . . . . . . . . . . . . . . . . . .73 Initial exception handler state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74 Restrictions on exception handlers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75 Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76 Setting up the exception handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78 Descheduled processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79 Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .80 Timeslicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81 Inactive processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82 Descheduled process state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82 Initializing multi-tasking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83 Scheduling kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84 Semaphores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84 Sleep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85
Multi-tasking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78
Instruction Set Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .87
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
A B C D Constants and data structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .169 Instruction set summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .174 Compiling for the ST20-C1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .178 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .187
3/205
(R)
1.1 ST20-C1 features
1
Introduction
This manual provides a summary and reference to the ST20 architecture and instruction set for the ST20-C1 core. ST20 is a technology for building successful embedded VLSI designs. ST20 devices comprise a collection of VLSI macro-cells connected through a high-performance onchip bus. This architecture allows the easy construction of both general purpose (e.g. ST20-MC1 micro-controller) and application specific devices (e.g. ST20-TPx digital set top box family). The ST20 macro-cell library includes CPU micro-cores, on-chip memories and a wide range of digital and analogue I/O devices. SGS-THOMSON offers a range of ST20 CPU micro-cores, allowing the best cost vs. performance trade-off to be achieved in each application area. This manual describes the ST20-C1 CPU micro-core. ST20 devices are available from SGS-THOMSON and licensed second source vendors.
1.1
*
ST20-C1 features
It is implemented as a 2-way superscalar, 3-stage pipeline, with an internal 16word register cache. This architecture can sustain 4 instructions in progress, with a maximum of 2 instructions completing per cycle. It uses a variable length instruction coding scheme based on 8-bit units which gives excellent static and dynamic code size. Instructions take between 1 and 8 units to code, with an average of 1.25 units (10 bits) per instruction. It provides flexible prioritized vectored interrupt capabilities. The worst case interrupt latency is 0.5 microseconds (at 33 Mhz operating frequency). It provides extensive instruction level support for 16-bit digital signal processing (DSP) algorithms. It is particularly suitable for low power and battery-powered applications, with low core operating power, and sophisticated power management facilities. It provides extensive real-time debugging capability through the optional ST20 diagnostic controller unit (DCU) macro-cell, which supports fully non-intrusive breakpoints, watchpoints and code tracing. It has a flexible and powerful built-in hardware scheduler. This is a light-weight real-time operating system (RTOS) directly implemented in the microcode of the ST20-C1 processor. The hardware scheduler can be customized and provides support for software schedulers. It provides a built-in user-programmable 32-bit input/output register providing system control and communication capability directly from the CPU.
The ST20-C1 has the following features:
*
* * * *
*
*
4/205
(R)
1 Introduction
1.2
1 2 3
Manual structure
This introduction chapter, which explains the structure of the book; A notation chapter (Chapter 2) which explains the layout and notation conventions used in the instruction definitions and elsewhere; An architecture chapter (Chapter 3), which explains the structure of the ST20C1 core, the registers, memory addressing, the format of the instructions and the exception handling and process models; Four chapters on using the instructions and how the instructions can be used to achieve certain useful outcomes: Chapter 4 on the general instructions; Chapter 5 on multiply-accumulate; Chapter 6 on interrupts and traps; and Chapter 7 on processes and support for multi-tasking. An alphabetical listing of the instructions, one to a page (Chapter 8). Descriptions and formal definitions are presented in a standard format with the instruction mnemonic and full name of the instruction at the top of the page. The notation used is explained in detail in Chapter 2.
The manual is divided into the following chapters:
4
5
In addition there are appendices listing constants and structures, covering issues related to compiling for a ST20-C1 core and listing the instruction set plus a glossary for ST20-C1 terminology.
5/205
(R)
2.1 Instruction listings
2
Notation
This chapter describes the notation used throughout this manual, including the meaning of the instruction listings and the meanings and values of constants.
2.1
Instruction listings
The instructions are listed in alphabetical order, one to a page. Descriptions are presented in a standard format with the instruction mnemonic and full name of the instruction at the top of the page, followed by these categories of information: * * * * * * Code: the instruction code; Description: a brief summary of the purpose and behavior of the instruction; Definition: a more formal and complete description of the instruction, using the notation described below in section 2.2; Status Register: a list of errors and other changes to the Status Register which can occur; Comments: a list of other important features of the instruction; See also: cross references are provided to other instructions with related functions.
These categories are explained in more detail below, using the and instruction as an example. 2.1.1 Instruction name
The header at the top of each page shows the instruction mnemonic and, on the right, the full name of the instruction. For primary instructions the mnemonic is followed by `n' to indicate the operand to the instruction; the same notation is used in the description to show how the operand is used. An explanation of the primary and secondary instruction formats is given in section 3.4. 2.1.2 Code
The code of the instruction is the value that would appear in memory to represent the instruction. For secondar y instructions the instruction `operation code' is shown as the memory code -- the actual bytes, including any prefixes, which are stored in memory. The value is given as a sequence of bytes in hexadecimal, decoded left to right. The codes are stored in memory in `little-endian' format, with the first b yte at the lowest address. For example, the entry for the and instruction is: Code: F9 This means that the hexadecimal byte value F9 would appear in memory for an and.
6/205
(R)
2 Notation For primary instructions the code stored in memory is determined partly by the value of the operand to the instruction. In this case the op-code is shown as `Function x' where x is the function code in the last byte of the instruction. For example, adc (add constant) is shown as Code: Function 8 This means that adc 1 would appear in memory as the hexadecimal byte value 81. For an operand n in the range 0 to 15, adc n would appear in memory as 8n. 2.1.3 Description
The description section provides an indication of the purpose of the instruction as well as a summary of the behavior. This may include details of the use of registers, whose initial values may be used as parameters and into which results may be stored. For example, the and instruction contains the following description: Description: Bitwise AND of Areg and Breg. 2.1.4 Definition
The definition section provides a formal description of the behavior of the instruction. The behavior is defined in ter ms of its effect on the state of the processor, i.e. the changes in the values in registers and memory before and after the instruction has executed. The effects of the instruction on registers, etc. are given as statements of the following form: register expression involving registers, etc. expression involving registers, etc. memory_location
Primed names (e.g. Areg) represent values after instruction execution, while names without primes represent values when the instruction execution starts. For example, Areg represents the value in Areg before the execution of the instruction while Areg represents the value in Areg afterwards. So the example above states that after the instruction has been executed the register or memory location on the left hand side holds the value of the expression on the right hand side. Only the changed registers and memory locations are given on the left hand side of the statements. If the new value of a register or memory location is not given then the value is unchanged by the instruction. The description is written with the main function of the instruction stated first. For example the main function of the add instruction is to put the sum of Areg and Breg into Areg). This is followed by the other effects of the instruction, such as rotating the stack. There is no temporal ordering implied by the order in which the statements are written.
7/205
(R)
2.2 Instruction definitions For example, the and instruction contains the following description: Definition: Areg Breg Creg Breg Areg Creg Areg
This says that the integer stack is rotated and Areg is assigned the bitwise AND of the values that were initially in Breg and Areg. After the instruction has executed Breg contains the value that was originally in Creg, and Creg has the value that was in Areg. The notation is described more fully in section 2.2. 2.1.5 Status Register
This section of the instruction definitions lists an y changes to bits of the Status register which can occur. The Status register is described in more detail in section 3.3.2. 2.1.6 Comments
This section is used for listing other information about the instructions that may be of interest. This includes an indication of the type of the instruction: "Primary instruction" -- indicates one of the 13 functions which may be directly encoded with an operand in a single byte instruction. "Secondary instruction" -- indicates an instruction which is encoded using opr. An explanation of the primary and secondary instruction formats is given in section 3.4. The Comments section also describes any situations where the operation of the instruction is undefined or invalid and any limits to the parameter values. For example, the only comment listed for the and instruction is: Comments: Secondary instruction. This says that and is a secondar y instruction.
2.2
Instruction definitions
The following sections give a full description of the notation used in the formal definition section of the instruction descriptions.
8/205
(R)
2 Notation 2.2.1 The process state
The process state consists of the registers (Areg, Breg, Creg, Iptr, Tdesc, Wptr, and Status), and the contents of memory. A description of the meanings and uses of the registers and special memory locations and data structures is given in section 3.3. 2.2.2 General
The instruction descriptions are not intended to describe the way the instructions are implemented, but only their effect on the state of the processor. So, for example, the result of mul is shown in terms of an intermediate result calculated to infinite precision, although no such intermediate result is used in the implementation. Comments (in italics) are used to both clarify the description and to describe actions or values that cannot easily be represented by the notation used here; e.g. take timeslice trap. Some of these actions and values are described in more detail in other chapters. An ellipsis is used to show a range of values; e.g. `i = 0..31' means that i has values from 0 to 31, inclusive. Subscripts are used to indicate particular bits in a word; e.g. Aregi for bit i of Areg; and Areg0..7 for the least significant b yte of Areg. Note that bit 0 is the least significant bit in a word, and bit 31 is the most significant bit. Except for Iptr, certain reserved words of memory, and taking exceptions or switching processes, if the description does not mention the state of a register or memory location after the instruction, then the value will not be changed by the instruction. Iptr is assigned the address of the next instruction in the code before the instruction execution starts. The Iptr is included in the description only when there are additional effects of the instruction (e.g. in the jump instruction). In these cases the address of the next instruction is indicated by the comment `next instruction'. 2.2.3 Undefined v alues
Some instructions in some circumstances leave the contents of a register or memory location in an undefined state . This means that the value of the location may be changed by the instruction, but the new value cannot be easily defined, or is not a meaningful result of the instruction. For example, when division by zero is attempted, Breg and Creg become undefined, i.e. they do not contain any meaningful data. An undefined value is represented by the name undefined. The values of registers which become undefined as a result of executing an instruction are implementation dependent and are not guaranteed to be the same on different members or revisions of the ST20 family of processors. 2.2.4 Data types
The instruction set includes operations on three sizes of data: 8, 16 and 32-bit objects. 8-bit and 16-bit data can represent signed or unsigned integers and 32-bit data can
9/205
(R)
2.2 Instruction definitions represent addresses, signed or unsigned integers. Generally the arithmetic in signed. In some cases it is clear from the context (e.g. from the operators used) whether a particular object represents a signed or unsigned number. A subscripted label is added (e.g. Aregunsigned) to clarify where necessary. 2.2.5 Representing memory
The memory is represented by arrays of each data type. These are indexed by a value representing a byte address. Access to the three data types is represented in the instruction descriptions in the following way: byte[address]references a byte in memory at the given address sixteen[address]references a 16-bit half word in memory word[address]references a 32-bit word in memory For all of these, the state of the machine referenced is that before the instruction if the function is used without a prime (e.g. word[address]), and that after the instruction if the function is used with a prime (e.g. word[address]). For example, writing a value given by an expression, expr, to the word in memory at address addr is represented by: word[addr] expr and reading a word from a memory location is achieved by: register word[addr] Writing to memory in any of these ways will update the contents of memory, and these updates will be consistently visible to the other representations of the memory. For example, writing a byte at address 0 will modify the least significant byte of the word at address 0. Data alignment Generally, word and half word data items have restrictions on their alignment in memory. Byte values can be accessed at any byte address, i.e. they are byte aligned. 16-bit objects can only be accessed at even byte addresses, i.e. the least significant bit of the address must be 0. 32-bit objects must be word aligned, i.e. the 2 least significant bits of the address must be zero. Address calculation An address identifies a par ticular byte in memory. Addresses are frequently calculated from a base address and an offset. For different instructions the offset may be given in units of bytes or words depending on the data type being accessed. In order to calculate the address of the data, a word offset must be converted to a byte offset before being added to the base address. This is done by multiplying the offset by the number of bytes per word, i.e. 4. As there are many accesses to memory at word offsets, a shorthand notation is used to represent the calculation of a word address. The notation register @ x is used to
10/205
(R)
2 Notation represent an address which is offset by x words (4x bytes) from the address in register. For example, in the specification of load non-local there is: Areg word[Areg @ n] Here, Areg is loaded with the contents of the word that is n words from the address pointed to by Areg, i.e. the word at address Areg + 4n. In all cases, if the given base address has the correct alignment then any offset used will also give a correctly aligned address.
2.3
Operators used in the definitions
A full list of the operators used in the instruction definitions is given in Table 2.1. Unless otherwise stated, all arithmetic is signed.
Symbol + - x / rem < > = >> << >>arith Meaning Unchecked (modulo) integer arithmetic Signed integer add, subtract, multiply, divide and remainder. If the computation overflows the result of the operation is truncated to the word length. If a divide or remainder by zero occurs the result of the operation is undefined. No errors are signalled. The operator `-' is also used as a monadic operator. Signed comparison operators Comparisons of signed integer values: `less than', `greater than', `less than or equal', `greater than or equal', `equal' and `not equal'.
Bitwise operators `Not', `and', `or', `exclusive or', logical left and right shift and arithmetic right shift operations on bits in words.
Boolean operators not and or Boolean combination in conditionals.
Table 2.1 Operators used in the instruction descriptions Modulo operators Arithmetic is done using modulo arithmetic -- i.e. there is no checking for errors and, if the calculation overflows, the result `wraps around' the range of values representable in the word length of the processor -- e.g. adding 1 to the address at the top of the
11/205
(R)
2.3 Operators used in the definitions address map produces the address of the byte at the bottom of the address map. These operators are represented by the symbols `+', `-', etc. Error conditions Any errors that can occur in instructions which are defined in ter ms of the modulo operators are indicated explicitly in the instruction description. For example the add instruction indicates the cases that can cause overflow or underflow, independently of the actual addition: if (sum > MostPos) { Areg sum - 2BitsPerWord Statusunderflo w clear Statusoverflow set } else if (sum < MostNeg) { Areg sum + 2BitsPerWord Statusunderflo w set Statusoverflow clear } else { Areg sum Statusunderflo w clear Statusoverflow clear } ... 2.3.1 Functions
Type conversions The following notation is used to indicate the type cast of x to a 16-bit integer: int16 (x) If x is too large or too small to fit into a 16-bit integer then the result of the instruction is undefined. Double word splitting Where a calculation is performed using a 48-bit or 64-bit value, the value may be split into two words. The function low_word returns the least significant word and the function high_word returns the most significant word.
12/205
(R)
2 Notation 2.3.2 Conditions to instructions
In many cases, the action of an instruction depends on the current state of the processor. In these cases the conditions are shown by an if clause; this can take one of the following forms: * * if condition statement if condition statement else statement if condition statement else if condition statement else statement
*
These conditions can be nested. Braces, {}, are used to group statements which are dependent on a condition. For example, the cj (conditional jump) instruction contains the following lines: if (Areg = 0) Iptr next instruction + n else { Iptr next instruction Areg Breg Breg Creg Creg Areg } This says that if the value in Areg is zero, then the jump is taken (the instruction operand, n, is added to the instruction pointer), otherwise the stack is popped and execution continues with the next instruction.
2.4
Data structures and constants
A number of data structures have been defined in this man ual. Each comprises a number of data slots that are referenced by name in the text and the instruction descriptions. These data structures are listed in the tables in Appendix A. Each table gives the name of each slot in the structure and the word offsets from the base address of the structure. A slot in a data structure is identified using the offset notation descr ibed in section 2.2.5: word[base_address @ word_offset]
13/205
(R)
2.4 Data structures and constants For example, the back pointer of a semaphore structure at address sem would be: word[sem @ s.Back] In addition, several constants are used to identify fixed values for the ST20-C1 processor. All the constants are listed in Appendix A. Product identity value This is the value returned by the ldprodid instruction. For specific product ids in the ST20 family refer to SGS-THOMSON.
14/205
(R)
2 Notation
15/205
(R)
3.1 Values
3
Architecture
This chapter describes the general architectural features of the ST20-C1 core which are relevant to more than one instruction or group of instructions. Interrupts and traps are described in Chapter 6 and support for multi-tasking is described in Chapter 7. Other features which are related to specific tasks are descr ibed in Chapter 4. A full list of constants and data structures is given in Appendix A. The ST20-C1 instruction set covers: * * * * * * * * control flow arithmetic and logical operations bit field manipulations shifting and byte-swapping register manipulations memory access with various addressing modes and data sizes task scheduling direct input/output
3.1
Values
The ST20-C1 core supports data objects of different sizes, either signed or unsigned. The sizes directly supported are bytes (8-bit), half words (16-bit), words (32-bit) and multiple words (64-bit, 96-bit etc.). Bytes, half-words and words may be loaded and stored. Arithmetic operations are provided for signed words and multiple words. A half word is called a sixteen in the instruction names. The most negative integer (0x80000000) is known as MostNeg and the most positive (0x7FFFFFFF) as MostPos. Boolean objects, taking one of the values true or false, are also used by some instructions. False is represented by the value 0 and true has the value 1. Section 4.7 describes how other values may be implemented for language compilation. Several data structures are defined in this man ual. Each comprises a number of data words (sometimes called slots) that are referenced by name in the text and the instruction descriptions and addressed as offsets from the base of the data structure. A full list of these data structures and other constants is given in Appendix A. 3.1.1 Ordering of information
The ST20 is little-endian - i.e. less significant data is always held in lower addresses. This applies to bits in bytes, bytes in words and words in memory. Hence, in a word of data representing an integer, one byte is more significant than another if its byte selector is larger. Figure 3.1 shows the ordering of bytes in words for the ST20.
16/205
(R)
3 Architecture
Most significant 3 2
Bytes in a word 1
Least significant 0
Most significant
Bits in a word
Least significant
31
0
Figure 3.1 Bytes and bits in words For example, the most significant bit of a word is bit 31, and the most significant byte is byte 3, consisting of bits 24 to 31. This ordering is compatible with Intel processors, but not Motorola or SPARC. For compatibility with other devices, a swap32 instruction is provided to reverse the order of bytes within a word. 3.1.2 Signed integers and sign extension
A signed object is stored in twos-complement format. A signed value may be represented by an object of any size. Most commonly a signed integer is represented by a single word, but as explained, it may be stored, for example, in a 64-bit object, a 16-bit object, or an 8-bit object. In each of these formats, all the bits within the object contain useful information. The length of the object that stores a signed value can be increased, so that the object size is increased without changing the value that is represented. This operation is known as sign extension. All the extra bits that are allocated for the larger object, are meaningful to the value of the signed integer; they must therefore be set to the appropriate value. The value for all these extra bits is the same as the value of the most significant bit - i.e. the sign bit - of the smaller object. The ST20-C1 provides instructions that sign extend byte and half-word objects to words. The example shown in Figure 3.2 shows how the value -10 is stored in a 32-bit register, either as an 8-bit object or as a 32-bit object. In this case, bits 31 to 8 are meaningful for the 32-bit object but not for the 8-bit object. These bits are set to 1 in the 32-bit object.
17/205
(R)
3.2 Memory
these bit values not related to integer value bit position 31
11110110 87 0
signed integer value (-10) stored as an 8-bit object (byte)
11 bit position 31
...
111110110 87 0
signed integer value (-10) stored as a 32-bit object (word)
Figure 3.2 Storing a signed integer in different length objects
3.2
Memory
The ST20 processor is a 32-bit word machine, with byte addressing and a 4 Gbyte address space. This section explains how data is arranged in that address space. The address of an object is the address of the base, i.e. the byte with the lowest address. 3.2.1 Word address and byte selector
A machine address, or pointer, is a single word of data which identifies a byte in memory - i.e. a byte address. It comprises two parts, a word address and a byte selector. The byte selector occupies the two least significant bits of the word; the word address the thirty most significant bits. An address is treated as a signed value, the range of which starts at the most negative integer and continues, through zero, to the most positive integer. This enables the standard arithmetic and comparison functions to be used on pointer values in the same way that they are used on numerical values. Certain values can never be used as pointers because they represent reserved addresses at the bottom of memory space. They are reserved for use by the processor and initialization. A full list of names and values of constants used in this manual is given in Appendix A. In particular, the null process pointer (known as NotProcess) has the value MostNeg, since zero could be a valid process address. 3.2.2 Alignment
A data object is said to be word-aligned if it is at an address with a byte selector of zero, i.e. the full address of the object is divisible by 4. Similarly, a data object is said to
18/205
(R)
3 Architecture be half-word-aligned if it is at an address with an even byte selector, i.e. the full address of the object is divisible by 2. Word objects, including addresses, are normally stored word-aligned in memory. This is usually desirable to make the best use of any 32-bit wide memory. Also most instructions that involve fetching data from or storing data into memory, use word aligned addresses and load or store four contiguous bytes. However, there are some instructions that can manipulate part of a word. A half-word object is normally half-word-aligned, so it can be stored either in the least significant 16 bits of a word or in the most significant 16 bits. A data item that is represented in two contiguous words is called a double word object and is normally word-aligned. 3.2.3 Ordering of information in memory
Data is stored in memory using the little-endian rule. Objects consisting of more than one byte are stored in consecutive bytes, with the least significant byte at the lowest address and the most significant at the highest address . Figure 3.3 shows the ordering of bytes in words in memory. If X is a word-aligned address then the word at X consists of the bytes at addresses X to X+3, where the byte at X is the least significant b yte and the byte at X+3 is the most significant byte of the word. Memory (bytes)
32-bit words
X+7 X+6 X+5 X+4 X+3 X+2 X+1 X+0
MSB X+7
31 24 23
X+6
16 15
X+5
87
LSB X+4
0
MSB X+3
31 24 23
X+2
16 15
X+1
87
LSB X+0
0
X is a word-aligned byte address X+n is the byte n bytes past X
Figure 3.3 Bytes in words in memory
3.2.4
Work space
The ST20-C1 uses a stack-based data structure in memory to hold the local working data of a program, called the work space. The work space is a word-aligned collection of 32-bit words pointed to by the work space pointer register (Wptr). The programmer's model is that all local data is held in the work space, i.e. in memory, and must be brought into the evaluation stack to be operated on, and then written back from the evaluation stack to the work space.
19/205
(R)
3.3 Registers An implementation of the ST20-C1 core may include a register cache. This provides a mechanism to accelerate access to local work space without changing the programmer's model of how the work space operates or impacting either the excellent code density or low interrupt latency associated with a stack-based instruction set.
3.3
Registers
This section introduces the ST20-C1 core registers that are visible to the programmer. Seven registers, known as process state registers, define the local state of the executing process. These registers are preserved through exceptions. One other register is provided for performing input/output, and is not preserved through exceptions. All registers are 32-bit. Each instruction explicitly refers to specific registers , as described in the instruction definitions. The state of an executing process at any instant is defined b y the contents of the machine registers listed in Table 3.1. The registers are illustrated in Figure 3.4 and described in the rest of this section.
Register Areg Breg Creg Iptr Status Wptr Tdesc IOreg Evaluation stack register A Evaluation stack register B Evaluation stack register C Instruction pointer register, pointing to the next instruction to be executed Status register Work space pointer, pointing to the stack of the currently executing process Task descriptor Input and output register Description
Table 3.1 Processor registers
ST20-C1 Core Evaluation Stack Areg Breg Creg Status IOReg Local Program Data Task Descriptor Tdesc Instruction Pointer Iptr Program Code Memory Task Control Block
Workspace Pointer Wptr
offset base
Figure 3.4 Register set
20/205
(R)
3 Architecture 3.3.1 Evaluation stack
The registers Areg, Breg and Creg are organized as a three register evaluation stack, with Areg at the top. The evaluation stack is used for expression evaluation and to hold operands and results of instructions. Generally, instructions may pop values from or push values onto the evaluation stack or both, and do not address individual evaluation stack registers. Pushing a value onto the stack means that the value initially in Breg is pushed into Creg, the value in Areg is pushed into Breg and the new value is put in Areg. Popping a value from the stack means that a value is taken from Areg, the value initially in Breg is popped into Areg, and the value in Creg is popped into Breg.The value left in Creg varies between instructions, but is generally the value initially in the Areg. These actions are illustrated in Figure 3.5 and Figure 3.6.
Before Areg Breg Creg a b c
After x a b
Figure 3.5 Pushing a value x onto the evaluation stack
Before Areg Breg Creg a b c
After b c a
Figure 3.6 Popping a value from the evaluation stack 3.3.2 Status register
The status register contains status bits which describe the current state of the executing process and any errors which may have been detected. Initially the status register is set to the value given in Table 7.3. The contents of the status register are summarized in Table 3.2 and described in more detail in the following paragraphs. Generally the status register is local except for the
21/205
(R)
3.3 Registers
Bit numbers 0-7 8 - 10 11 - 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 - 31
Full name mac_count mac_buffer mac_scale mac_mode global_interrupt_enable local_interrupt_enable overflo w underflo w carry user_mode interrupt_mode trap_mode sleep reserved start_next_task timeslice_enable timeslice_count
Meaning when set or meaning of value Multiply-accumulate number of steps. Multiply-accumulate data buffer size code. Multiply-accumulate scaling code. Multiply-accumulate accumulator format code. Enable external interrupts until explicitly disabled. Enable external interrupts. Clearing this bit disables interrupts until the current process is descheduled. An arithmetic operation gave a positive overflow. An arithmetic operation gave a negative overflow. An arithmetic operation produced a carry. A user process is executing. An interrupt handler is executing or trapped. A trap handler is executing. The processor is due to go to sleep. Reserved. The CPU must start executing a new process. Timeslicing is enabled. Timeslice counter.
Table 3.2 Status register bits global interrupt enable and timeslice enable, which are global and carried from one process to another across a context switch. * The mac_count, mac_buffer, mac_scale and mac_mode fields are used b y the multiply-accumulate instructions to hold initialization data which must be saved when an exception occurs. See Chapter 5 for details of multiply accumulation. The global_interrupt_enable bit enables external interrupts. Interrupts remain enabled or disabled until explicitly disabled or enabled again. This bit is global and is maintained when a process is descheduled. The local_interrupt_enable enables external interrupts. Clearing this bit disables external interrupts until the current process is descheduled. This is needed when a process delegates part of its processing to a peripheral and then deschedules until completion, as described in section 4.11.3. Overflo w, underflo w and carry bits relating to arithmetic state are kept in the status word. The ST20-C1 maintains "sticky" bits in the status word which indicate whether an overflow or underflow has occurred. This allows a complete expression to be evaluated before testing whether an overflow has occurred. Overflow and underflo w are chosen as they apply both to addition as well as multiply as opposed to a more traditional method of replicating two bits out of the carry
*
*
*
22/205
(R)
3 Architecture chain. In addition, they allow for saturated arithmetic to be implemented relatively easily. A (non-sticky) carry bit is provided to allow efficient implementation of long addition and subtraction. The carry bit is only manipulated by the addc and subc instructions allowing the other add instructions to be used in address formation of multi-word values where carry propagation is required so that the carry is not lost in the address formation evaluations. * The user_mode bit indicates when the machine is handling a user process, i.e. a process which is not an exception handler. The interrupt_mode bit indicates when the machine is handling an interrupt, or a trap from an interrupt handler and the trap_mode bit indicates when the machine is executing a trap handler. An operating system may need to distinguish between modes to allow it to perform scheduling activities from a trap handler. These bits are also required to enable the eret instruction to determine whether a signal to the interrupt controller is required. The sleep bit indicates that the CPU is due to go to sleep, i.e. to turn off its clocks and go into low power mode. This bit is set when the CPU detects there is no user process to execute and is cleared when the CPU goes to sleep. The start_next_task bit when set causes the processor to attempt to run the next process from the scheduling queue. The timeslice_enable bit and timeslice_count field are used for timeslicing, as described in section 7.4.
*
* *
The instructions to use the status register are described in section 4.12. 3.3.3 The work space pointer
All programs need somewhere to store local working data, e.g. local variables in the application code. In the ST20 architecture, this local storage is termed the work space of the program. The Wptr register is the local work space pointer, which holds the address of the stack of the executing process. The stack is downward pointing, so space is allocated by moving the Wptr to a lower address. This address is word aligned and therefore has the two least significant bits set to zero. When a process is descheduled, the Wptr is stored as part of the process descriptor block, which is pointed to by Tdesc. The Wptr is used as a base for addressing local variables. A word offset from the Wptr is the operand for the instructions ldl (load local), stl (store local) and ldlp (load local pointer). The ST20-C1 simplifies the nor mal stack scheme by decoupling the load/store action from the pointer update: * * Load-local and store-local instructions access values in the work space with addresses relative to the Wptr, but do not change the value of Wptr. Separate instructions (ajw, gajw) are provided to update the work space
23/205
(R)
3.4 Instruction encoding pointer by any amount in one step without needing a series of increments or decrements. On calling a function or procedure, the Wptr is normally decreased to a lower address to allocate space for the parameters and local variables of the function. This is performed using the instruction ajw. The Wptr is returned to its initial value before returning from the function to free the local work space. 3.3.4 The task descriptor
The task descriptor Tdesc points to the process descriptor block for the currently executing process. The value held in the Tdesc becomes the process identifier when the process is not executing. The process descriptor block is a block of memory whose contents depend on the state of the process. It will generally hold the saved Wptr and Iptr for the process, and may hold a link to the next process if the process is in a queue of waiting processes. The process descriptor block is described in section 7.2. 3.3.5 IO register
The bits of the IOreg are mapped to external connections on the ST20-C1 core. They may be used to signal to, or read signals from, peripherals on or off chip. The io instruction is used to read and write to the IOreg and is described in section 4.11. The IOreg is global, and remains unchanged by any context switch. The bits of the IOreg are defined in Table 3.3.
Bits 0-15 16-31 Output data Input data Purpose
Table 3.3 IOreg bits In some ST20 variants, some bits of the IO register may be reserved for system use. The reserved bits will be the most significant bits of the appropr iate half word. The number of any such bits is given in the data sheet for each variant.
3.4
Instruction encoding
The ST20-C1 is a zero-address machine. Instruction operands are always implicit and no bits are needed in the instruction representation to carry address or operand location information. This results in very short instructions and exceptionally high code density. The instruction encoding is designed so that the most commonly executed instructions occupy the least number of bytes. This reduces the size of the code, which saves memory and reduces the memory bandwidth needed for instruction fetching. This section describes the encoding mechanism. A sequence of single byte instruction components is used to encode an instruction. The ST20 interprets this sequence at the instruction fetch stage of execution. Most
24/205
(R)
3 Architecture programmers, working at the level of microprocessor assembly language or high-level language, need not be aware of the existence of instruction components and do not generally need to consider the encoding. This section has been included to provide a background. Appendix C discusses consequential issues which need to be considered in order to implement a code generator. 3.4.1 An instruction component
Each instruction component is one byte long, and is divided into two 4-bit parts. The four most significant bits of the byte form a function code, and the four least significant bits are used to build an instruction data value as shown in Figure 3.7.
function code 7 4 3
data 0
Figure 3.7 Instruction format This representation provides for sixteen function code values (one for each function), each with a data field ranging from 0 to 15. Instructions that specify the instruction directly in the function code are called primary instructions or functions. There are 13 primary instructions, and the other three possible function code values are used to build larger data values and other instructions. Two function code values, pfix and nfix, are used to extend the instruction data value by prefixing. One function code operate (opr) is used to specify an instruction indirectly using the instruction data value. opr is used to implement secondar y instructions or operations. 3.4.2 The instruction data value and prefixing
The data field of an instr uction component is used to create an instruction data value. Primary instructions interpret the instruction data value as the operand of the instruction. Secondary instructions interpret it as the operation code for the instruction itself.
mnemonic name prefix negative prefix
pfix n nfix n
Table 3.4 Prefixing instr uction components The instruction data value is a signed integer that is represented as a 32-bit word. For each new instruction sequence, the initial value of this integer is zero. Since there are only 4 bits in the data field of a single instruction component, it is only possible for most instruction components to initially assign an instruction data value in the range 0 to 15. Prefix components are used to extend the range of the instruction data value.
25/205
(R)
3.4 Instruction encoding One or more prefixing components may be needed to create the full instruction data value. The prefixes are shown in Table 3.4 and explained below. All instruction components initially load the four data bits into the least significant four bits of the instruction data value.
pfix loads its four data bits into the instruction data value, and then shifts this value up four places. Consequently, a sequence of one or more prefixes can be used to extend the data value of the following instruction to any positive value. Instruction data values in the range 16 to 255 can be represented using one pfix. nfix is similar, except that it complements all 32 bits of the instruction data value before shifting it up, thus changing the sign of the instruction data value. Consequently, a sequence of one or more pfixes with one nfix can be used to extend the data value of a following instruction to any negative value. Instruction data values in the range -256 to -1 can be represented using one nfix.
When the processor encounters an instruction component other than pfix or nfix, it loads the 4-bit data field into the instruction data value. The instruction encoding is now complete and the instruction can be executed. The instruction data value is then cleared so that the processor is ready to fetch the next instruction component, by building a new instruction data value. For example, to load the constant 0x11, the instruction ldc 0x11 is encoded with the sequence:
pfix 1; ldc 1
The instruction ldc 0x2A68 is encoded with the sequence:
pfix 2; pfix A; pfix 6; ldc 8
The instruction ldc -1 is encoded with the sequence:
nfix 0; ldc F
3.4.3 Primary Instructions
Research has shown that computers spend most time executing a small number of instructions such as: * * * instructions to load and store from a small number of `local' variables; instructions to add and compare with small constants; and instructions to jump to or call other parts of the program.
For efficiency, in the ST20 these are encoded directly as primary instructions using the function field of an instruction component. Thirteen of the instruction components are used to encode the most important operations performed by any computer executing a high level language. These are used (in conjunction with zero or more prefixes) to implement the primary instructions. Primary instructions interpret the instruction data value as an operand for the instruction. The mnemonic for a primary instruction always includes this operand, shown in this manual as n.
26/205
(R)
3 Architecture The mnemonics and names for the primary instructions are listed in Table 3.5.
mnemonic name add constant adjust work space function call conditional jump equals constant jump load constant load local load local pointer load non-local load non-local pointer store local store non-local
adc n ajw n fcall n cj n eqc n jn ldc n ldl n ldlp n ldnl n ldnlp n stl n stnl n
Table 3.5 Primary instructions 3.4.4 Secondary instructions
The ST20 encodes all other instructions, known as secondary instructions, indirectly using the instruction data value.
mnemonic name operate
opr n
Table 3.6 Operate instruction The function code opr causes the instruction data value to be interpreted as the operation code of the instruction to be executed. This selects an operation to be performed on the values held in the evaluation stack, so that a further 16 operations can be encoded in a single byte instruction. The pfix instruction component can be used to extend the instruction data value, allowing any number of operations to be encoded. Secondary instructions do not have an operand specified b y the encoding, because the instruction data value has been used to specify the operation. To ensure that programs are represented as compactly as possible, the operations are encoded in such a way that the most frequently used secondary instructions are represented without using prefix instructions. For example, the instruction add is encoded by:
opr 4
The instruction and is encoded by:
opr F9
27/205
(R)
3.4 Instruction encoding which is in turn encoded with the sequence:
pfix F; opr 9
3.4.5 * * Summary of encoding It produces very compact code. It simplifies language compilation, by providing a completely uniform way of allowing a primary instruction to take an operand of any size up to the processor word-length. It allows these operands to be represented in a form independent of the wordlength of the processor. It enables any number of secondary instructions to be implemented.
The encoding mechanism has important consequences.
* *
To aid clarity and brevity, prefix sequences and the use of opr are not explicitly shown in this guide. Each instruction is represented by a mnemonic, and for primary instructions an item of data, which stands for the appropriate instruction component sequence. Hence the examples above would be just shown as: ldc 17, add, and and. Where appropriate, an expression may be placed in a code sequence to represent the code needed to evaluate that expression.
28/205
(R)
3 Architecture
29/205
(R)
4.1 Manipulating the evaluation stack
4
Using ST20-C1 instructions
This chapter describes the purpose for which the sequential instructions are intended, except for the multiply-accumulate instructions, which are described in Chapter 5. These instructions are described in the context of their intended use. Some instructions are designed for use in a particular sequence of instructions, so this chapter describes those sequences. Instructions for exceptions are described in Chapter 6 and multi-tasking instructions are described in Chapter 7. The architecture of the ST20-C1, including the registers and memory arrangement, is described in Chapter 3.
4.1
Manipulating the evaluation stack
The evaluation stack consists of the registers Areg, Breg and Creg. The general action of the evaluation stack is described in section 3.3.1. Instructions are provided for shuffling and re-order ing the values on the evaluation stack, as listed in Table 4.1.
Mnemonic Name rotate stack anti-rotate stack duplicate stack reverse stack
rot arot dup rev
Table 4.1 Evaluation stack manipulation instructions
rot pops the value from Areg off the evaluation stack and rotates it into Creg, and arot pushes the value from Creg onto the stack. rev swaps the Areg and Breg, and dup pushes a copy of Areg onto the stack.
Table 4.2 shows how each of these affects the evaluation stack. Each row shows the contents of the evaluation stack after one of these instructions is executed if the initial values of the Areg, Breg and Creg are a, b and c respectively.
Instruction Areg b c b a Breg c a a a Creg a b c b
rot arot rev dup
Table 4.2 Evaluation stack manipulation Many instructions leave the initial Areg in Creg. This value may be restored into the Areg by using arot.
30/205
(R)
4 Using ST20-C1 instructions
4.2
Loading and storing
Name load constant load local store local load non-local store non-local load byte and increment store byte and increment load sixteen and increment load sixteen sign extended and increment store sixteen and increment load word and increment store word and increment Load the constant n. Load the value from n words above Wptr. Store a value to n words above Wptr. Load the value from n words above Areg. Store a value to n words above Areg. Load a byte and increment the address by 1 byte. Store a byte and increment the address by 1 byte. Load a half word and increment the address by 2 bytes. Load a half word and sign extend to 32 bits and increment the address by 2 bytes. Store a half word and increment the address by 2 bytes. Load a word and increment the address by 4 bytes. Store a word and increment the address by 4 bytes. Description
The loading and storing instructions are listed in Table 4.3.
Mnemonic
ldc n ldl n stl n ldnl n stnl n lbinc sbinc lsinc lsxinc ssinc lwinc swinc
Table 4.3 Loading and storing instructions On the ST20, the term loading means pushing a value onto the evaluation stack. The value to be loaded may be a value read from memory, a constant, a copy of another register or a calculated value. Storing means popping a value from the evaluation stack. The value may be written into memory or written into another register. The evaluation stack is described in section 3.3, and evaluation of expressions is described in section 4.3. Relative addresses are used for accessing memory in order to reduce code size, as the operand values are smaller than full machine addresses. Data structures are word-aligned, so relative addresses can be word offsets, reducing the operand size further. The most common operations performed by a program are loading and storing of a small number of variables, and loading small literal values. 4.2.1 Loading constants
One primary instruction ldc is provided for loading a general constant, for initializing a variable or register or for a constant in an expression. 4.2.2 Local and non-local variables
When loading from and storing to memory, the ST20 distinguishes between local and non-local addressing. Local addressing means that the address is given as a word offset from the Wptr. Non-local addressing means that the address is given as a word offset from the Areg. In practice, the Wptr points to the stack, so local addressing is
31/205
(R)
4.2 Loading and storing normally used for local variables on the stack while non-local addressing is normally used for all other variables. The primary instructions ldl and stl perform loading and storing of local variables. For example to load a value x words above the Wptr and write to a location y words above the Wptr:
ldl x; stl y;
The primary instructions ldnl and stnl perform loading and storing of non-local variables. For example, to load a value x above a base address x_base and store to a location y words above y_base, where x_base and y_base are held in local variables:
ldl x_base; ldnl x; ldl y_base; stnl y;
Note that for the purposes of this manual, ld X denotes loading the value from a variable X, where X may be a local or non-local variable, so either ldl or ldnl may be used as appropriate. Similarly st X denotes storing a value into a variable X, where X may be a local or non-local variable, so either stl or stnl may be used. 4.2.3 Byte and half-word values
Instructions are provided for loading and storing byte and half-word variables. In each case, the address is initially in the Areg and is incremented by the size of the object, so that repeated loads and stores can be used to copy a block of memory. The load instructions place the loaded value in the Areg, the incremented address in the Creg and leave the Breg unaffected. The store instructions write the initial Breg into memory at the address in the Areg, leaving the incremented address in the Breg, the initial Creg in the Areg and the initial Breg pushed down to the Creg. Byte loading and storing
lbinc loads the byte at the address in Areg, into the evaluation stack. lbinc replaces the address in Areg with the byte stored at that address, treating it as an unsigned integer by setting the twenty-four most significant bits in Areg to 0. The incremented address is left in Creg. sbinc writes the least significant b yte in Breg to the location addressed by Areg. The address is incremented by 1 and put in the Breg.
Half-word loading and storing
lsinc and lsxinc load the half-word object at the address in Areg, into the evaluation stack. lsinc replaces the address in Areg with the half word, treating it as an unsigned integer by setting the sixteen most significant bits in Areg to 0. lsxinc is similar to lsinc, but treats the half-word as a signed integer in twos-complement format, and hence sign extends the representation by setting the sixteen most significant bits in Areg to the same value as the most significant bit of the half-word object. Sign extension is discussed in section 4.4.6.
32/205
(R)
4 Using ST20-C1 instructions
ssinc writes the half word in the two least significant b ytes of Breg to the location addressed by Areg.
4.2.4 Memory block copy
A block memory copy may be implemented using the instructions lwinc and swinc. These instructions load or store a word, and increment the addresses used. To copy n bytes from source to destination, where source and destination are both word-aligned, a loop should be written, using the temporary variable limit, as in the following code:
ld source; ld n; ld destination add; stl limit LOOP: lwinc; rev; swinc ldl limit; arot gt; cj END; j LOOP; END:
This is the most efficient method of copying, since it reads and writes full words, making the best use of any 32-bit memory. However, this is not always possible if the alignment of the source and destination blocks are different. In that case the byte or half-word load and store should be used.
4.3
Expression evaluation
Expression evaluation and address calculation is performed using the evaluation stack. For example, the evaluation of operations with two integer operands is performed by instructions that operate on the values of Areg and Breg. The result is left in Areg. Arithmetic and boolean calculations are considered in sections 4.4 and 4.7 respectively. This section describes how the evaluation stack is used. Loading and storing instructions are described in section 4.2. In this and subsequent sections, in examples of assembly code, a single letter or identifier wr itten as an instruction is either an expression or a segment of code. If it is an expression then it means `evaluate the expression and leave the result in the Areg. 4.3.1 Using the evaluation stack
A compiler normally loads a constant expression c using ldc:
ldc c
Loading from a constant table is described in section 4.3.3. An expression consisting of a single local variable is loaded using
ldl x
Methods for loading non-local variables are discussed in section 4.2, and array elements in section 4.5.
33/205
(R)
4.3 Expression evaluation Evaluation of expressions sometimes requires the use of temporary variables in the process work space, but the number of these can be minimized by careful choice of the evaluation order. The details of how this is achieved by a compiler are described in Appendix C in section C.3. 4.3.2 Loading operands
The three registers of the evaluation stack are used to hold operands of instructions. Evaluation of an operand or parameter may involve the use of more than one register. Care is needed when evaluating such operands to ensure that the first oper and to be loaded is not pushed off the bottom of the evaluation stack by the evaluation of later operands. The processor does not detect evaluation stack overflow. Three registers are available for loading the first operand, two registers for the second and one for the third. Consequently, the instructions are designed so that Creg holds the operand which, on average, is the most complex, and Areg the operand which is the least complex. In some cases, it is necessar y to evaluate the Areg and Breg operands in advance, and to store the results in temporary variables. This can sometimes be avoided using the reverse instruction. Any of the following sequences may be used to load the operands A, B and C into Areg, Breg and Creg respectively.
1 2 3 4
C; B; A; C; A; B; rev; B; C; rev; A; A; C; rev; B; rev;
The choice of loading sequence, and of which operands should be evaluated in advance is determined by the number of registers required to evaluate each of the operands. The algorithm used by compilers is given in Appendix C in section C.4. 4.3.3 Tables of constants
The ST20-C1 instruction set has been optimized so that the loading of small constants can be coded compactly -- for example it allows the loading of constants between 0 and 15 to be coded in a single byte. Analysis of programs shows that such small constants occur markedly more frequently than large constants. However when a large constant does need to be loaded the necessary prefix sequence may be long. Other techniques may be more efficient in these cases . A simple mechanism to increase the code compactness is to use a table of constants. This is implemented by storing all the long constants into a look-up table. This table and all its constant entries must be aligned on a word boundary. The address of this table is held in a local variable which is used to index the array. Then to load the
34/205
(R)
4 Using ST20-C1 instructions constant from the nth entry in the constant table stored at address constants_ptr the following code would be used:
ldl constants_ptr ; ldnl n;
where the instruction ldnl n is explained in section 4.2.2. This code sequence only takes 2 bytes, provided constants_ptr is less than 16 words from the work space pointer address and there are no more than 16 word-length constants. At worse it is unlikely to take more than 4 bytes. Hence, if a constant takes 4 or more bytes to load using ldc then this sequence often improves code compactness -- especially if the constant is used more than once. 4.3.4 Assignment
Single words, half words and bytes may be assigned using the load and store instructions described in section 4.2. Word assignment If x and y are both single word variables and e is a word valued expression then word assignments are compiled as
x=y x=e
Byte assignment
compiles to compiles to
ld y; st x; e; st x;
If a and b are both single byte variables and e is a byte valued expression then byte assignments are compiled as
b=a b=e
compiles to compiles to
address(a); lbinc; address(b); sbinc; e; address(b); sbinc;
where address(variable) is the address of variable. Forming addresses is discussed in section 4.5. Half word assignment If a and b are both half-word variables and e is a half-word valued expression then half-word assignments are compiled as
b=a b=e
compiles to compiles to
address(a); lsinc; address(b); ssinc; e; address(b); ssinc;
where address(variable) is the address of variable. Forming addresses is discussed in section 4.5.
4.4
Arithmetic
This section describes the use of the arithmetic instructions except for the multiplyaccumulate instructions, which are described in Chapter 5, and forming addresses, which is described in section 4.5. Boolean expression evaluation is discussed in
35/205
(R)
4.4 Arithmetic section 4.7, and the general principles of expression evaluation are described in section 4.3. 4.4.1 Addition, subtraction and multiplication
Single length signed arithmetic is provided by the operations listed in Table 4.4.
Mnemonic Name add constant add subtract multiply short multiply
adc n add sub mul smul
Table 4.4 Single length signed integer arithmetic instructions Each of these instructions except smul can signal overflow or underflow by setting the appropriate bit in the status register. An overflow occurs if the result is greater than MostPos and an underflow if it is less than MostNeg. If overflow or underflow occurs, then the 32 least significant bits of the full result are left in the Areg. The overflow and underflow are `sticky', so when one has been set, it is not cleared and the other cannot be set by subsequent arithmetic. The overflow and underflo w bits may be used for saturated arithmetic, as described in section 4.4.3. The primary instruction adc n adds the constant value n to Areg. Breg and Creg are unaffected. This is used for incrementing and decrementing variables and counters. If op is one of add, sub, mul or smul, then the instruction sequence
ldl X; ldl Y; op;
evaluates the expression
X op Y
i.e. it takes the value in Breg as the left hand operand and the value in Areg as the right hand operand, and loads the result into Areg. The content of Creg is popped into Breg and the initial Areg is rotated into Creg.
smul multiples two half-word values producing a 32-bit result. It cannot overflow or underflow and is faster than mul.
4.4.2 Division and remainder
Division and remainder are performed using the operations listed in Table 4.5.
Mnemonic Name divide step unsign argument
divstep unsign
Table 4.5 Division and remainder instructions
36/205
(R)
4 Using ST20-C1 instructions Each divstep generates four bits of the unsigned quotient, so eight divsteps are needed for a full 32-bit unsigned division, and will also generate a remainder. The result of the division is the integer division rounded towards zero (truncated). The quotient is left in Breg, and the remainder in Creg, so a rotation pops the quotient into the Areg.
unsign is used to separate the sign from the magnitude of the operands before performing the division. Division is then performed on the magnitudes, and the signs of the results may be derived from the signs of the operands.
Overflow can occur only if the divisor (Areg) is zero, or if the dividend (Breg) is MostNeg and the divisor is -1. divstep does not detect these cases, and does not set any status bits, so a check should be applied before performing the division. The following code sequence performs the integer division a/b. The signed quotient is left in Areg.
POS:
a; b; ldc 0; arot; unsign; arot; unsign; cj POS; ldc 0; rot; divstep; divstep; divstep; divstep; divstep; divstep; divstep; divstep; rot; not; adc 1; j END; rot; divstep; divstep; divstep; divstep; divstep; divstep; divstep; divstep; rot;
END:
The following code sequence performs the remainder a rem b. The signed remainder is left in Areg.
POS:
a; b; ldc 0; rev; unsign; eqc 2; arot; unsign; cj POS; divstep; divstep; divstep; divstep; divstep; divstep; divstep; divstep; arot; not; adc 1; j END; rot; divstep; divstep; divstep; divstep; divstep; divstep; divstep; divstep; arot;
END:
4.4.3 Saturated arithmetic
In saturated arithmetic, when an overflow or underflow occurs the result is set to the most positive or most negative possible result respectively, instead of the least significant bits of the full result. This ensures that the result is as near as possible to the real value and prevents glitches caused by wrap-around.
37/205
(R)
4.4 Arithmetic Saturated arithmetic is achieved on the ST20-C1 by evaluating an expression and then performing the saturate instruction. If an overflow or underflow has occurred then the corresponding status bit will have been set, which will cause saturate to change the value in Areg to the most positive or most negative value respectively. saturate clears the overflow and underflow bits. For example, to perform a saturated multiply of a and b:
ld a; ld b; mul; saturate;
4.4.4 Unary minus
The expression (-e) can be evaluated with overflow signalling by:
e; not; adc 1;
or
ldc 0; e; sub;
The first sequence , using not, requires one less stack register than the second. not is a bitwise inversion which is described in section 4.8. 4.4.5 Long arithmetic
The long arithmetic instructions are listed in Table 4.6.
Mnemonic Name add with carry subtract with carry unsigned multiply accumulate
addc subc umac
Table 4.6 Long arithmetic instructions Multiple length addition and subtraction Multiple length addition or subtraction are performed using addc and subc, executed once for each word of the result. For both instructions, the carry (or borrow) is held in the carry bit of the status register. This keeps the carrying separate from overflow, so address calculations may be safely performed using add, sub and wsub without affecting the carry. The addc instruction forms (Breg + Areg) + Statuscarry leaving the least significant word of the result in Areg and the most significant (carr y) bit in the carry bit of the status register. The Areg is rotated into the Creg. Similarly, the subc instruction forms (Breg - Areg) - Statuscarry leaving the least significant word of the result in Areg and the borrow bit in the carry bit of the status register. The Areg is rotated into the Creg. Addition of two double length unsigned values, X and Y, giving Z, without overflow signalling can therefore be compiled as follows
38/205
(R)
4 Using ST20-C1 instructions
ldc 0; ldl Xlo; ldl Ylo; addc; stl Zlo; ldl Xhi; ldl Yhi; addc; stl Zhi
The subscripts `lo' and `hi', used here and in subsequent text, specify the least and most significant word respectively of the double word variable with which they are associated. Subtraction of two double length values, Y from X giving Z, without overflow signalling is compiled as
ldc 0; ldl Xlo; ldl Ylo; subc; stl Zlo; ldl Xhi; ldl Yhi; subc; stl Zhi
Overflow signalling for signed arithmetic may be added by performing an extra addc or subc to produce a final w ord which contains only a sign (0 for positive or -1 for negative) unless an overflow has occurred. For example, the following code could be used to perform double length signed addition with overflow signalling:
clear carry, overflow and underflow status bits ld Xlo; ld Ylo; addc; st Zlo; ld Xhi; ld Yhi; addc; st Zhi; ldc 0; dup; addc; dup; adc #7ffffff; - overflows if and only if carry word > 0 rev; adc #8000001; - underflows if and only if carry word < -1
Multiple length multiplication The umac instruction multiplies two single word unsigned operands in Areg and Breg, and adds the single word carry operand in Creg to form a double length unsigned result. The more significant (carr y) word of the result is left in Breg, the less significant in Areg. No overflow can be signalled by this instruction. Multiplication of a single length unsigned value X by a double length unsigned value Y (leaving the `carry' in Areg) can be performed by:
ldc 0; ldl X; ldl Ylo; umac; stl Zlo; ldl X; ldl Yhi; umac; stl Zhi
Double length unsigned multiplication is more complex. The product of two unsigned double length words X and Y can be expressed as:
X * Y= (Xhi*232 + Xlo)*(Yhi*232 + Ylo) = (Xhi*Yhi)*264 + (Xhi*Ylo + Xlo*Yhi)*232 + (Xlo*Ylo)
This can be coded as follows:
ldc 0; ldl Xlo; ldl Ylo; umac; stl Z0 ldl Xlo; ldl Yhi; umac; rev; stl Z2
39/205
(R)
4.4 Arithmetic
ldl Xhi; ldl Ylo; umac; stl Z1; ldl Xhi; ldl Yhi; umac; rev; stl Z3; ldc 0; rev; ldl Z2; addc; stl Z2; ldl Z3; addc; stl Z3
This gives a quadruple length unsigned result Z, where Z0 is the least significant and Z3 the most significant word of Z. 4.4.6 Object length conversion
Object length conversion operations are provided by the instructions listed in Table 4.7.
Mnemonic Name sign extend byte to word sign extend sixteen to word
xbword xsword
Table 4.7 Object length conversion instructions Section 3.1 explains that data can be represented in data objects of various sizes. This section describes the instructions that can be used to convert between these representations. Most of the ST20-C1 integer arithmetic instructions operate on signed integers held in the evaluation stack registers as 32-bit objects, and produce results in this form. Object length conversion is important for conversion of high level language data types. The ST20-C1 therefore provides instructions that allow a byte or half-word signed integer to be sign extended to 32-bits by copying the sign bit to all the bits that were previously not significant, as shown in Figure 3.2. The sign extension is performed on the value in the Areg and the result is placed in the Areg. The other registers are not affected.
xbword extends a signed byte to a word by copying bit 7 into bits 8 to 31. xsword extends a signed half-word to a word by copying bit 15 into bits 16 to 31. lsxinc loads a sixteen bit value, sign extends it to 32 bits and increments the address by two bytes. This is the same as: lsinc; xsword;
40/205
(R)
4 Using ST20-C1 instructions
4.5
Forming addresses
The addressing instructions provide access to items in data structures using short sequences of single byte instructions. These instructions are listed in Table 4.8.
Mnemonic Name load local pointer load non-local pointer load pointer to instruction word subscript Meaning Load the value Wptr + 4n. Load the value Areg + 4n. Load the value Iptr + n. Load the value Areg + 4.Breg.
ldlp n ldnlp n ldpi n wsub
Table 4.8 Addressing instructions 4.5.1 The address of a variable
The absolute address of a local work space location is loaded using the ldlp primary instruction. ldlp 0 can be used to load the value in the Wptr. The ldnlp primary instruction is provided to calculate the absolute address of a nonlocal variable. The meaning of local and non-local is described in section 4.2. 4.5.2 The address of an instruction
The address of a location in the program being executed can be obtained by the ldpi operation as follows. The address of the location x bytes past the next instruction (which is itself pointed to by the instruction pointer register) can be pushed onto the evaluation stack by
ldc x; ldpi
For example, the address of a label L can be loaded by
ldc (L-M); ldpi M:
where the label M is the address of the instruction that follows the ldpi instruction. First the offset in bytes from M to L is loaded into Areg. The ldpi then uses this offset and the value in the instruction pointer register (which will be the address of label M) to load the address of label L into Areg. This technique is useful for generating relocatable code. Breg and Creg are unaffected. 4.5.3 Arrays
The wsub instruction interprets Areg as the address of the beginning of a vector of word-sized data objects, and Breg as an index into that vector. After execution, Areg holds the address of the indexed element, and Creg is popped into Breg, leaving Areg rotated into Creg. The operation performed by wsub is to multiply the integer in Breg by four and to add this to the address in Areg (without overflow checking).
41/205
(R)
4.5 Forming addresses Access to a component of an array can be split into two sections; first the address of the component must be constructed, and then the transfer of data to or from that component must be performed. Evaluating a subscript Array subscripts can be evaluated efficiently using the smul or mul instruction. If array A has been declared by int A[S1] . . . [Sn]; where Si (i = 1..n) are the dimensions, then one way of arranging this in memory is to have all elements of the array in a contiguous block. For the purposes of this section, suppose that the elements in the last dimension are stored adjacently; otherwise change the order of the dimension subscripts. For example Figure 4.1 shows the elements of a particular three dimensional array (Array) stored in this way.
int Array[2][2][3];
Increasing memory addresses
Array[1][1][2] Array[1][1][1] Array[1][1][0] Array[1][0][2] Array[1][0][1] Array[1][0][0] Array[0][1][2] Array[0][1][1] Array[0][1][0] Array[0][0][2] Array[0][0][1] Array[0][0][0]
Contiguous locations for words in memory space
Figure 4.1 A possible method of storing an array of integers If an access is required to the following array element
A[e1] . . . [en]
then the code to evaluate the subscript is
e 1; ldc S2; mul; e2; add; ldc S3; mul; e3; add; . .. ldc Sn; mul; en; add;
For example to evaluate the subscript for element Array[x][y][z], (where Array is declared as in Figure 4.1) the code sequence is
ld x; ldc 2; mul; ld y; add; ldc 3; mul; ld z; add;
If x is 1, y is 0 and z is 2, then this evaluates to 8, which as can be seen from Figure 4.1, is the correct offset from the base of the array.
42/205
(R)
4 Using ST20-C1 instructions Accessing a word addressed array Let Wa_ptr be a pointer to an array Wa that starts at a word boundary, and in which all component types are measured in words. Let e be a subscript expression. The address of component e of Wa is
e; Wa_ptr; wsub;
or if e is a constant expression this can be optimized to:
Wa_ptr; ldnlp e;
Accessing a byte addressed array Similarly, let Ba_ptr be a pointer to an array (Ba) which may start at any byte location, and in which each component type is measured in bytes. Let e be a subscript expression. The address of component e of Ba is:
e; Ba_ptr; add;
4.6
Comparisons and jumps
This section describes the arithmetical comparison instructions and their use in conditional program behavior. Unconditional jumps are also described. Functions and procedures are described in section 4.10, and evaluation of boolean expressions is described in section 4.7. Comparisons, conditional behavior and jumps are provided by the instructions listed in Table 4.9.
Mnemonic Name equal to constant greater than greater than unsigned order order unsigned conditional jump jump jump absolute
eqc n gt gtu order orderu cj n jn jab
Table 4.9 Comparison and jump instructions 4.6.1 Representation of true and false
The ST20 uses 0 as false and 1 as true. These values are generated by predicate operations (for example comparisons). They can be loaded with single byte load constant instructions.
43/205
(R)
4.6 Comparisons and jumps Implementation of languages with different representations of true and false It is easy to implement programming languages that use a different representation of true and false. For example, using
eqc X; not; adc 1
in place of eqc X and
gt; not; adc 1
in place of gt, does not affect the representation of a false result, but changes the representation of true to -1, which is used in some programming languages. 4.6.2 Comparison
The primary instruction eqc n loads Areg with a truth value -- true if Areg is initially equal to the instruction operand (n), false otherwise. Breg and Creg are unaffected.
gt and gtu take integer operands in Areg and Breg and produce a boolean result which is loaded into Areg. They also load the value in Creg into Breg, saving a copy of the initial Areg in Creg.
The gt instruction loads Areg with true if Breg > Areg, false otherwise, treating Areg and Breg as signed values. Similarly gtu loads Areg with true if the unsigned value of Breg is greater than the unsigned value of Areg; false otherwise. 4.6.3 Jump and conditional jump
There are two relative jump instructions; both are primary instructions. The unconditional jump instruction, j n, adds its operand (n) to the address of the instruction immediately following it and puts the result into Iptr, thus transferring execution to another part of the program. The conditional jump instruction, cj n, performs a jump if the value in Areg is 0 and does not affect the evaluation stack. If the value in Areg is not 0 cj rotates the value in Areg to the bottom of the evaluation stack and continues with the next instruction. Consequently cj n serves as `jump if false' provided that the language being implemented interprets 0 as false (see section 4.6.1). 4.6.4 Conditional transfer of control
The conditional expressions used in a conditional branch of an if construct are compiled using the conditional jump. The statement: if (E) { P } This compiles to:
E; cj L; P; j ENDIF; L:
44/205
(R)
4 Using ST20-C1 instructions where the label ENDIF: is at the end of the code for the if construct. The compilation of a while loop is shown by the following example. while (E) { P } This compiles to:
L: ENDWHILE:
E; cj ENDWHILE P; timeslice; j L
Note that this loop includes a timeslice instruction. This causes the current process to be descheduled if a timeslice is due and timeslicing is enabled. The presence of this ensures that the process cannot occupy the CPU for too long provided timeslicing is enabled. It is good practice for multi-tasking programs to include a timeslice instruction in every loop. Timeslicing is described in section 7.4. Single task programs do not need to timeslice, but should have timeslicing disabled, so the timeslice instruction has no effect. A repeat .. until loop is shown by the following example. repeat { P } until E This compiles to:
L: K: END:
4.6.5
jK E; eqc 0; cj END P; timeslice; j L
Ordering instructions
Two instructions are provided to select the smaller of two values. If Breg is smaller than Areg as a signed integer, order will swap Areg and Breg; otherwise order will have no effect. This can be used to find the minimum of two signed variables:
ldl a; ldl b; order; stl minimum; stl maximum
Similarly orderu can be used to find the minimum or maximum of two unsigned values
4.7
Evaluation of boolean expressions
This section describes the operations using the logical true and false values, as used with the conditional jump cj. Conditional behavior and comparisons are described in section 4.6. Bitwise boolean operations are described in section 4.8. General issues concerning expression evaluation are discussed in section 4.3.
45/205
(R)
4.7 Evaluation of boolean expressions The following shows the correspondence between C logical expressions and ST20-C1 instructions. X and Y represent expressions,and K represents a constant. The symbol `' is a logical NOT (see section 4.7.1).
true false !X X == Y X != Y X == K X != K X>Y X= Y X <= Y
= = = = = = = = = = =
ldc 1 ldc 0 (X) X; Y; sub; eqc 0 (X; Y; sub; eqc 0) X; eqc K (X; eqc K) X; Y; gt Y; X; gt (Y; X; gt) (X; Y; gt)
Further optimizations can be made to the `not equals' comparison when followed by a conditional jump.
X != Y; cj L X != 0; cj L
4.7.1
= =
X; Y; sub; cj L X; cj L
Evaluation of NOT
If zero represents false and 1 represents true, then logical NOT can be performed by eqc 0. 4.7.2 Evaluation of AND and OR
For evaluation of logical AND and OR operations, the instruction sequence depends on whether strict or non-strict evaluation is used, i.e. whether both operands are always evaluated. This is important if side-effects may occur, such as a trap, or if the second operand is not always defined, as in: if ((ptr != NULL) && (ptr->tag == TAG_VAL)) ... In this example, ptr->tag is not defined if ptr is NULL. For languages such as ANSI C, non-strict evaluation is required, so the following short-cuts must be used:
X OR Y X AND Y
= =
((X); cj L; (Y); L:) X; cj L; Y; L:
For non-strict evaluation, the following laws should be applied to the compilation of conditional expressions before code is generated to ensure that the jump is taken as early as possible: (X AND Y) (X OR Y) (X OR Y); cj L (X AND Y); cj L = = = =
(X) OR (Y) (X) AND (Y) (X); cj M; Y; cj L; M: X; cj L; Y; cj L
[= (X; cj L; Y; L:)] [= (X); cj L; (Y); L:]
46/205
(R)
4 Using ST20-C1 instructions In other languages, evaluation of boolean expressions may be strict (for example, ADA gives the programmer the choice) and so both expressions in dyadic logical operations may need to be evaluated. Where false is represented by 0, and true is represented by any fixed bit pattern other than 0 (e.g. true is always 1, or true is always -1), then the following transformations apply:
X OR Y X AND Y
= =
X BITOR Y X BITAND Y
and the bitwise instructions given in section 4.8 can be used: Note that even for some non-strict evaluations, the above sequence may be preferable. Where Y is a simple boolean expression such as a local variable, its evaluation does not cause any side-effects, and so it does no harm to implement a non-strict evaluation using a bitwise operation.
4.8
Bitwise logic and bit operations
Mnemonic Name and or exclusive or bitwise not bit load bit store bit mask memory read modify write
Bitwise logic and bit operations are provided by the instructions listed in Table 4.10.
and or xor not bitld bitst bitmask rmw
Table 4.10 Bitwise logic and bit instructions The not operation has only one operand that is taken from Areg. The result of this, which is a bitwise inversion of all bits in the operand, is loaded into Areg, leaving Breg and Creg unaffected.
and, or and xor are bitwise logical operations on two operands that are taken from Areg and Breg. For each, the result is loaded into Areg. The data previously held in Creg is popped into Breg and the initial Areg is left in Creg. These operations are commutative. bitld, bitst and bitmask are used for setting, clearing and testing bits of a word. bitld returns the value of a single bit from a value in Breg, bitst sets or clears a single bit and bitmask creates a mask with a single bit set. In each case the bit number is initially in Areg and the result is put in Areg. For bitld and bitst, the value containing the bit to be tested, set or cleared is initially in Breg.
47/205
(R)
4.9 Shifting and byte swapping 4.8.1 Memory bit test and clear or set
Bits of a word in memory may be tested and set or cleared by the instruction rmw. The address of the memory word is held in Areg and a bit masks in Breg and Creg. rmw clears the bits of the memory word that are set in Creg, and then sets the bits of the memory word that are set in Breg.The initial memory word is loaded into Areg, with Areg pushed down to Breg and Breg pushed down to Creg.
4.9
Shifting and byte swapping
The shift and byte swapping operations are provided by the instructions listed in Table 4.11.
Mnemonic Name shift left shift right arithmetic shift right byte swap 32
shl shr ashr swap32
Table 4.11 Shifting and byte swapping instructions The shift operations (shl, shr and ashr) shift the operand in Breg by the number of bits specified b y the unsigned integer in Areg and put the result in Areg. shl and shr fill the vacated bit positions with zero bits, while ashr fills the vacated bits with copies of bit 31, which is the original sign bit. If Areg is zero, the result is the initial value of Breg. When the value in Areg is greater than the number of bits in the object being shifted, the result of the operation is undefined. The data previously held in Creg is popped into Breg, and the initial Breg is left in the Creg.
swap32 reverses the order of the bytes in Areg by swapping byte 0 with byte 3 and swapping byte 1 with byte 2.
4.10 Function and procedure calls
The function and procedure call operations are provided by the instructions listed in Table 4.12.
Mnemonic Name function call jump absolute adjust work space general adjust workspace
fcall jab ajw gajw
Table 4.12 Function and procedure instructions The primary instruction fcall n calls a function or procedure. It stores the instruction pointer (which holds the return address) in the word pointed to by the Wptr. The operand to the call - n - is added to the address of the next instruction to produce the address of the first instr uction of the procedure or function being called. Since the call address is relative, the code is relocatable.
48/205
(R)
4 Using ST20-C1 instructions A function called using fcall must have a fixed offset at compile time. jab is used for calling functions and procedures at dynamically calculated addresses, for example when using function pointers. The jab instruction is also used to perform the return. The return address must have been restored with a ldl from the stack into the Areg. A procedure or function that requires local work space will normally include ajw instructions to allocate and deallocate space. When the jab instruction is executed, the programmer must ensure that: * * the Areg holds the return address; any workspace claimed by the procedure should have been released so that the Wptr has returned to the value it held at the start of the procedure.
The jab instruction uses one word of the evaluation stack, so the other two words can therefore be used to return up to two values to the calling code, including a pointer to a block of additional data to be returned. 4.10.1 Adjusting work space The primary instruction ajw is used to perform a relative adjustment to the stack pointer Wptr to: * * create work space on the stack at the beginning of the function and return the work space pointer at the end of the function.
ajw n increases the value of the workspace pointer by the number of words in its operand value, n. Work space is created at the beginning of a function or procedure with a negative operand and released before returning with a positive operand.
The amount of extra work space needed will normally include: * * * space to save any parameters passed in the evaluation stack; space for local variables and temporaries; space for any hidden system variables such as the static chain.
For example, a function with w words of local work space might be: T myfunction (param_1, param_2) { local variable declarations; P; return (E); } This can be compiled as:
ajw -w; stl param_1; stl param_2; P; E; ajw w;
49/205
(R)
4.10 Function and procedure calls
ldl 0; jab;
ST20-C1 processors may have a workspace cache which holds a copy of a few words at the bottom of the work space. This cache is transparent to the programmer but may substantially improve performance. It is refilled whene ver the Wptr is adjusted, so the ajw instruction should not be used excessively. 4.10.2 Parameters It is convenient to load the first three parameters of the procedure or function into the evaluation stack registers, and to arrange the work space of the calling code so that the additional parameters can be stored in locations 1, 2, ... of the work space before the procedure is called. Location zero of the work space is used for the return address. This is illustrated in Figure 4.2, which shows a possible work space layout for a function or procedure with six parameters and four local variables.
Wptr in calling code
Wptr in function or procedure
param 6 param 5 param 4 Return Iptr param 3 param 2 param 1 variable 1 variable 2 variable 3 variable 4
Figure 4.2 Example function or procedure workspace To enable the procedure to access non-local variables the parameters of a procedure may include a link to the environment in which the procedure was declared. 4.10.3 Returning results Up to two results of size less than or equal to the word length of the processor can be returned from a function in the evaluation stack -- the jab instruction uses the third register. Further results, or results larger than the word length, can be returned by passing into the function the addresses of locations to store these results as extra parameters.
50/205
(R)
4 Using ST20-C1 instructions A C function is used for purposes of illustration. For simplicity, it is assumed that the single result can be returned in the evaluation stack:
ajw -local_variables-1; P; E; ldl local_variables; ajw local_variables+1; jab;
One of the loading sequences described earlier may be required if the expressions returned in the registers contain evaluations. 4.10.4 Calling a function The first three par ameters should be loaded into the evaluation stack before the fcall instruction. These parameters can be stored as local variables after the workspace pointer has been moved down, to make the best use of the work space cache. The remainder of the parameters passed should be loaded into the work space before fcall is executed. When the function returns, the results whose addresses were passed will already have been stored so all that remains is to store up to 2 results returned in the evaluation stack. For example the function call
V = F(E1, . . ., En)
could be compiled by
E3; stl 0; . . . ; En; stl (n-3); E2; E1; static_link; fcall F; stl V;
The compiler must have already allocated sufficient workspace for the parameters that are stacked explicitly. Single result functions In most programming languages, a function that returns a single result can be used in an expression as well as in an assignment. A common form of function returns a single value contained in a word -- the mechanism described above will return this in Areg. When compiling expressions, (using the algorithm described in section C.3 in Appendix C) the depth of such a function call should be taken as being infinite -- i.e. deeper than any other form of expression. This is because the function call will always lose any other information in the registers. By giving it infinite depth the expression compilation algorithm will never call a function while another expression result is being held in a register.
51/205
(R)
4.11 Peripherals and I/O 4.10.5 Other work space allocation techniques The gajw instruction exchanges the contents of Wptr and Areg, allowing work spaces to be allocated dynamically, and allowing dynamic switching between existing work spaces. If a process work space holds a pointer to a new work space, then the following code changes to the new work space and stores a pointer to the old work space.
ldl Wnew; gajw; stl Wold;
The old work space can be restored by
ldl Wold; gajw;
In addition, the old work space can be accessed from the new work space, using
ldl Wold; ldnl x; ldl Wold; stnl x; ldl Wold; ldnlp x;
4.11 Peripherals and I/O
The peripheral and I/O instructions are listed in Table 4.13.
Mnemonic Name input / output bit mask load task descriptor stop process
io bitmask ldtdesc stop
Table 4.13 Peripheral and I/O instructions 4.11.1 Using the IO register The IO register is a 32-bit register used for simple bit control of devices outside the core. The bits of the register are directly mapped to external connections on the ST20C1 core. The connections to and from the IO register may be to on-chip or external peripherals depending on the particular chip design. The bits of the IO register are defined in Table 3.3. Some bits at the most significant end of each half word may be reserved for the system in some ST20 variants; see the data sheet for the variant. Setting an output bit will cause the corresponding connection to be driven high, and clearing the bit will drive the connection low. Similarly any input or output bit may be tested for the state of the connection; if the connection is high the bit will be set and if the connection is low the bit will be clear. The instruction io sets and clears bits of the IO register and loads a copy of the initial IO register. A bit in the bottom half-word may be set in the IO register by:
ldc 0; ldc bit_number; bitmask; io;
A bit in the bottom half-word may be cleared in the IO register by:
ldc bit_number; bitmask; ldc 0; io;
52/205
(R)
4 Using ST20-C1 instructions Any bit may be read from the IO register by:
ldc 0; dup; io; ldc bit_number; bitld;
The IO register is global and is not changed or saved by a context switch. If more than one process accesses the IO register then it may need to be protected by a semaphore. On reset the IO register is set to all zeros. 4.11.2 Memory-mapped peripherals On-chip peripherals may have memory-mapped registers in the address space. Access to these registers is performed in the same way as accessing memory. If a peripheral has a block of word-aligned registers with base address peripheral then a register with word offset register may be read by:
ld peripheral; ldnl register;
and value may be written to the register by:
ld value; ld peripheral; stnl register;
4.11.3 Channel-type peripherals Some peripherals, for example peripherals using DMA (direct memory access), may use a channel-type control model. This section describes how to use such peripherals, which use a micro-interrupt to notify the CPU that an assigned job is completed. This type of peripheral works best with a multi-tasking program, so that the CPU has other processes to execute while the peripheral is busy. However, if multi-tasking is not otherwise required, then an interrupt model can be used. Multi-tasking is described in Chapter 7 and interrupts and the exception vector table are described in Chapter 6. Multi-tasking The principle of using the channel model with multi-tasking is that the CPU tells the peripheral to start a job and then deschedules the current process. The job might be peripheral input/output or DMA transfer. This allows the CPU to continue executing other processes while the job is in progress. When the peripheral completes the job it signals to the CPU, which reschedules the process. To enable this to happen, the task descriptor of a user process can be entered into the exception vector table. This entry is called the peripheral channel. The peripheral signals a micro-interrupt, which interrupts the CPU with the exception level associated with the user process. The CPU recognizes that the exception vector table entry is a user process because bit zero is UserProcessType, and either adds the process to the end of the scheduling queue or takes a schedule exception trap if installed. The scheduling exception trap allows a scheduling kernel to control the rescheduling of the process. In more detail, the steps to perform a job using this model are:
53/205
(R)
4.11 Peripherals and I/O 1 2 3 4 5 6 7 The CPU saves the task descriptor of the current process in the exception vector table at the exception level for the peripheral. The CPU tells the peripheral the number of bytes to be read or written, the address of the start of the data or input buffer. The CPU signals to the peripheral that the job can start. The CPU deschedules the process, using the stop instruction, to wait for the peripheral to complete the job. The CPU executes other processes while the peripheral performs the job. The peripheral completes the job and sends a micro-interrupt to the CPU, with the exception level. The CPU reads the exception vector table and recognises the entry as a user process. The process is added to the back of the scheduling queue, or if the schedule execution trap is enabled then the trap is taken.
The code to execute steps 3 to 4 must not be interrupted, since otherwise the peripheral job may be completed before the process is descheduled by a stop instruction, which may crash the processor. Interrupts may be temporarily disabled by clearing the local_interrupt_enable bit of the status register, which is set automatically when the process is descheduled. Steps 5 to 7 happen automatically and need no coding. The code to drive the peripheral will depend on the peripheral and the interface to it. Typically the parameters (e.g. the byte count and the buffer address) would be written to memory mapped registers and then a further write would be needed to start the peripheral job. The code to perform a DMA to transmit param_count bytes to or from param_buffer using exception level except_level where the DMA registers periph_count, periph_buffer and periph_start are in a block at peripheral would be similar to the following:
ldtdesc; ld ExceptionBase; stnl except_level; ld param_count; ld peripheral; stnl periph_count; rev; ld param_buffer; stnl periph_buffer; ldc local_interrupt_enable; bitmask; statusclr; rot; ldc start_value; arot; stnl periph_start; stop;
The process will resume at the next instruction after the stop when the peripheral job is complete. Single tasking For programs which are not multi-tasking, it is not desirable to deschedule the program while the peripheral is busy. The main program should continue with other jobs while the peripheral is busy. An interrupt handler can be written to signal to the main program that the peripheral job is complete. The descriptor of the interrupt handler exception control block is placed in the exception vector table. The descriptor
54/205
(R)
4 Using ST20-C1 instructions is the address of the control block ORed with the type flag ExceptionProcessType in bit 0. The code to perform a DMA to transmit param_count bytes to or from param_buffer using exception level except_level where the DMA registers periph_count, periph_buffer and periph_start are in a block at peripheral and the interrupt handler is at except_control_block would be similar to the following:
ld except_control_block; ldc ExceptionProcessType; or; ld ExceptionBase; stnl except_level; ld param_count; ld peripheral; stnl periph_count; ld param_buffer; arot; stnl periph_buffer; ldc start_value; arot; stnl periph_start;
When the interrupt handler starts execution the peripheral job will be complete.
4.12 Status register
The status register may be manipulated using the instructions listed in Table 4.14.
Mnemonic Name clear bits in status register set bits in status register test status register
statusclr statusset statustst
Table 4.14 Semaphore instructions In each of these instructions, Areg holds a bit mask. statusclr copies the initial status register into the Areg and clears the bits of the status register that are set in the initial Areg. For example, to clear the bit bit_number:
ldc bit_number; bitmask; statusclr statusset is similar, but sets the bits of the status register that are set in the initial Areg. statustst returns in the Areg the status bits masked by the initial Areg. The Breg and Creg are unaffected.
The status register is described in section 3.3.2.
55/205
(R)
5.1 Data formats
5
Multiply accumulate
This section describes the multiply-accumulate instructions and their use. All these instructions are described in the context of their intended use. Instructions for general use (arithmetic, loading, storing etc.) are described in Chapter 4. Instructions for exceptions are described in Chapter 6 and multi-tasking instructions are described in Chapter 7. The architecture of the ST20-C1, including the registers and memory arrangement, is described in Chapter 3. Multiply accumulate operations are provided by the signal processing instructions listed in Table 5.1.
Mnemonic Name multiply accumulate unsigned multiply accumulate initialize short multiply accumulate loop short multiply accumulate loop biquad IIR filter step
mac umac smacinit smacloop biquad
Table 5.1 Multiply accumulate instructions
5.1
Data formats
A signed fractional number of N bits is described as x.y, where x+y=N. This means the number is made up from x bits before the binary point, an implied binary point, and y fractional bits. More details of the data formats are given in section 5.7.
5.2
mac and umac
mac and umac are general purpose multiply accumulate instructions, multiplying two 32-bit values and adding them to a 32-bit unsigned initial accumulator, giving a 64-bit accumulator. mac treats the multiplicands as signed and umac treats them as unsigned. Initially Areg and Breg hold the values to be multiplied and Creg holds the initial accumulator. On completion, Areg is the least significant word of the result accumulator, Breg the most significant and Creg holds a copy of the initial Areg.
5.3
Short multiply accumulate loop
The smacloop instruction performs a multiply-accumulate operation on two vectors of 16-bit values held in memory. It takes an initial accumulator value and two pointers, one to each of two data vectors. * The X vector of data values is normally considered to reside within a circular buffer of programmable size, but this can be turned off. When data fetches reach the end of this buffer, the pointer wraps-around back to the start of the buffer and continues. The X vector must be word aligned. The Y vector of coefficients is always in a flat address space , and never wraps around. The Y vector must be half-word aligned.
*
56/205
(R)
5 Multiply accumulate The data items from each vector are read from memory in turn and the products formed between corresponding pairs from the two vectors. Each of these products is added into the running accumulator value. The instruction completes with 3 values in the stack - the final accum ulator value and the two updated data pointers. Four control values are held in the status register, as shown in Table 5.2.
Field mac_count mac_buffer mac_scale mac_mode Size Meaning
8 bits The number of steps (from 1 to 256 items). 3 bits The size code for the data buffer within which the data vector lies. 2 bits Shift control for scaling coefficient v alues. 1 bit Accumulator format - 0 indicates 16-bit (short) and 1 indicates 32-bit (long) value.
Table 5.2 smacloop status register fields These values are initialized by the smacinit instruction. The smacinit instruction takes a packed control word in Areg, extracts the control fields and loads these into the status register. For smacinit, Areg is organized as shown in Table 5.3.
Field mac_count mac_buffer mac_scale mac_mode Size 8 bits 3 bits 2 bits 1 bit Least significant bit 0 8 11 13 Most significant bit 7 10 12 13
Table 5.3 smacinit Areg format These status register values are global and are not saved when a process is timesliced or descheduled. If more than one process is performing short multiply accumulate loops then the values should be reloaded by the process code using smacinit after each timeslice and stop instruction. 5.3.1 X buffer size
The X vector buffer size is determined by the mac_buffer control field, which may take the values 0 to 7. When mac_buffer is 0, then no address wrapping takes place, i.e. the buffer is assumed to be of infinite size. Otherwise, the buffer size is 2mac_buffer+2, as shown in Table 5.3. The X buffer must be aligned to a multiple of its own size, so a buffer of N bytes must start at an address whose value is a multiple of N bytes. 5.3.2 Number of steps
The mac_count control field in the status register deter mines the number of multiplyaccumulate steps for smacloop. This is an unsigned 8-bit integer. The value zero signifies 256 steps; otherwise the value of mac_count is the number of steps.
57/205
(R)
5.4 Biquad IIR filter
mac_buffer 0 1 2 3 4 5 6 7
Buffer size (data items) infinite 4 8 16 32 64 128 256
Buffer size (bytes) infinite 8 16 32 64 128 256 512
Table 5.4 mac_buffer coding 5.3.3 Scaling
Scaling of the input data in the X vector is controlled by the mac_scale field of the status register, as described in section 5.6. 5.3.4 Accumulator format mode
smacloop supports two data formats for the initial and final accum ulator value. If mac_mode is 0 then Q15 (sign extended to 32 bits) is used, and the mode is said to be ShortMode. If mac_mode is 1 then Q31 is used, and the mode is said to be LongMode.
5.4
Biquad IIR filter
The biquad instruction performs a fixed sequence of 5 multiply-accumulates. Figure 5.1 shows an example using Q14 format. The parameters to the instruction are pointers to three vectors of 16-bit values: * * * an X input data vector, a Y results vector and a coefficient vector C. This vector must be word-aligned.
biquad calculates the next item in the Y vector according to the following formula, and writes this to memory, incrementing the X and Y pointers by two bytes:
Y[2] = X[0].C[0] + X[1].C[1] + X[2].C[2] + Y[0].C[3] + Y[1].C[4] The X and Y vectors must be either both word-aligned or neither word-aligned. The C pointer is left unchanged. This allows successive biquad instructions to be executed back-to-back to generate a set of filter outputs with no additional overhead.
biquad scales the X input data according to the mac_scale field of the status register, as described in section 5.6. The mac_scale field ma y be set using smacinit, as described in section 5.3.
58/205
(R)
5 Multiply accumulate
5.5
Data vectors
Both biquad and smacloop operate on arrays of 16-bit values, packed two per word. This allows the ST20-C1 to read two values per cycle from memory which is fundamental to the high performance of the multiply-accumulate instructions. In all cases, data values must be half-word aligned.
c[2] Input
Q14
<< 9
<< 22
1 Output y[2]
Q15
c[1] Z-1
x[2]
Q14
<< 9
>> 23
Q15
c[4] Z-1
<< 9
Q14
x[1] c[0]
-1
y[1] << 9 << 9
Q14
Q14
c[3]
-1
Z
Z x[0] y[0]
Shifts
Coefficients: 0, L4, L8, L9
Accumulator: R23
Figure 5.1 ST20-C1 biquad instruction example: Q15 = Q15 x Q14
5.6
Scaling
The biquad and smacloop operations are performed with an oversize accumulator of 48 bits. The accumulator value is always sign-extended to the full width of the accumulator. During a multiply-accumulate sequence the value in the accumulator may temporarily go outside the representable range of the final result, but can never overflow the accumulator for a single biquad or smacloop. 5.6.1 Accumulator scaling
The user-visible accumulator is either in LongMode (Q31) or ShortMode (Q15). For smacloop, the mode is defined b y the mac_mode status register field. biquad only supports ShortMode. Pre-scaling converts the user-visible accumulator to an internal format accumulator, as shown in Figure 5.2. The inverse operation is post-scaling which is converting an
59/205
(R)
5.6 Scaling internal format accumulator to a user-visible accumulator.
ShortMode and biquad 15
0 User-visible accumulator
47
38 1
0 Internal accumulator Bit initially set for rounding Implied binary point
LongMode
31 0 User-visible accumulator 47 38 1 0 Internal accumulator
Figure 5.2 Accumulator scaling The accumulator is left-shifted 8 bits so that the assumed binary point is moved from below bit 30 to below bit 38. The accumulator value is saturated from bit 38 upwards. At the end of the multiply-accumulate sequence the accumulator value is shifted down (right) by an extra 8 bits to compensate for the left shift of 8 on coefficient inputs. 5.6.2 Coefficient scaling
The standard data format for data and coefficients is 1.15 (Q15). The product of two 1.15 numbers is 2.30. The coefficient value for each multiply-accumulate operation is pre-scaled before being fed into the multiplier. The shift distances are controlled by the mac_scale field: of the status register, as shown in Table 5.5.
mac_scale 0 1 2 3 coefficient shift 0 left 4 left 8 left 9
Table 5.5 mac_scale values The standard behavior for 1.15 (Q15) values is to shift the coefficients b y 8 places, using mac_scale set to 2, which exactly compensates the extra right shift of 8 on the
60/205
(R)
5 Multiply accumulate final accum ulator. This is shown in Figure 5.3.
X data value 15 0 15
Y data value 0
<< 8 23 8 0
47
39
8
0
Add to internal accumulator
Figure 5.3 Coefficient scaling with mac_mode set to 2 Shifting the coefficient b y 1 extra place (i.e. 9 places) is used to normalize a 2.14 (Q14) coefficient to the correct position for the binary point. This is shown in Figure 5.4.
X data value 15 0 15
Y data value 0
<< 9 24 9 0
47
40
9
0
Add to internal accumulator
Figure 5.4 Coefficient scaling with mac_mode set to 3
61/205
(R)
5.6 Scaling Under shifting (by less than 8) is used for magnitude reduction (most suitable for smacloop) by 4 bits (x16) using mac_scale=1 or 8 bits (x256) using mac_scale=0. Under-shifting by 4 bits is shown in Figure 5.5, and by 8 bits in Figure 5.6.
X data value 15 0 15
Y data value 0
<< 4 23 4 0
47
35
4
0
Add to internal accumulator
Figure 5.5 Coefficient scaling with mac_mode set to 1
X data value 15 0 15
Y data value 0
23
0
47
31
0
Add to internal accumulator
Figure 5.6 Coefficient scaling with mac_mode set to 0
62/205
(R)
5 Multiply accumulate 5.6.3 Pre-scaling and rounding - smacloop
The initial accumulator value must be loaded into the accumulator for the start of the smacloop instruction. The initial accumulator is either in Q31 format (LongMode), or Q15 format (ShortMode) and is handled accordingly. In LongMode, * * * * * * left shift by 7 places (right 1 + left 8) sign extend from bit 38 to the most significant bit set bit 6 = 1 (for rounding) left shift by 23 places (left 15 + left 8)
In ShortMode, sign extend from bit 38 to the most significant bit set bit 22 = 1 (for rounding)
Note that rounding is achieved by adding half of the least significant bit to the initial value, which by associativity is equivalent to adding it to the final value. 5.6.4 Pre-scaling and rounding - biquad
The biquad instruction starts with an empty accumulator. Since the result is always Q15 (equivalent to ShortMode), rounding is achieved by loading the accumulator with 222 (which is half of the least significant bit of the result). 5.6.5 Post-scaling and saturation - smacloop
At the end of a smacloop instruction the final accum ulator value is saturated and scaled to the appropriate format, according to mac_mode. If either the overflow or underflow bits in the status register are set, then the final value is set to the appropriate exceptional value from Table 5.6. Note that this does not involve testing the accumulator value, which is considered to be invalid if either of these status register bits are set. Otherwise, when the status register reports no overflow or underflow, bits 38 to 47 inclusive of the accumulator are tested for mutual equality. If they are all the same, the accumulator value is well-formed and post-scaling is applied to produce the final accumulator value. In LongMode, * * * * arithmetic right shift by 7 places(left 1 + right 8) truncate to low 32 bits arithmetic right shift by 23 places (right 15 + right 8) truncate to low 32 bits
In ShortMode,
63/205
(R)
5.6 Scaling However if the bits are not all equal, then an error has occurred. If the most significant bit (bit 47) is zero then the overall result is positive, so the error is overflow. If the most significant bit is 1 then the o verall result is negative, so the error is underflow. The appropriate status register bit (overflow or underflow) is set accordingly, and the final value is taken from Table 5.6. 5.6.6 Post-scaling and saturation - biquad
The result of a biquad is always short (16-bits). The saturation test is from bit 38 upwards, and the overflow/underflo w flags are set as required. Note the initial value of the overflow/underflo w flags is not taken into account. If a saturation error occurs, the appropriate ShortMode value from Table 5.6 is used. If no error occurs, the scaling is: * * 5.6.7 arithmetic right shift by 23 places truncate to low 32 bits Error: load exceptional value
There are four exceptional values based on whether the result is to be delivered as a ShortMode (Q15) or LongMode (Q31) value, and whether the error was overflow or underflow:
mac_mode Overflo w 00007FFF 7FFFFFFF Underflo w FFFF8000 80000000
ShortMode LongMode
Table 5.6 Exceptional values Note that in all cases the final value is placed in a 32-bit register (Areg). 5.6.8 Performance and interrupts
biquad
The biquad instruction takes 11 cycles to execute, assuming single cycle memory accesses. The ST20-C1 cannot be interrupted during this period.
smacloop
A smacloop of n steps takes n+4 cycles to complete, assuming single cycle memory accesses. The ST20-C1 cannot be interrupted during this period, which may be up to about 8 microseconds (for single cycle memory and an operating frequency of 33 Mhz). The user may split a long multiply accumulate into a set of shorter ones passing the intermediate accumulator value from one to the next. This will reduce interrupt latency, but loses some numerical accuracy in that within a single smacloop intermediate values are held to 48 bits of precision, while values passed from one smacloop to another have at most 30 bits of precision (and are saturated).
64/205
(R)
5 Multiply accumulate
5.7
Data formats
This section gives details of the data formats used by the smacloop and biquad instructions. A signed fractional number of N bits is characterized as x.y, where x+y=N. This means the number is made up from x bits before the binary point, an implied binary point, and y fractional bits. Some examples are listed in Table 5.7.
Total bits 32 16 16 Range -1 n < 1 -1 n < 1 -2 n < 2 Format 1.31 1.15 2.14 Short name Q31 Q15 Q14
Table 5.7 Example data formats 5.7.1 Range
A value n in x.y format has a range -2x-1 n < 2x-1. For example, a value in the format 1.15 has range -1 n < 1, and the format 2.14 has range -2 n < 2. 5.7.2 Multiplication
The characteristic of the product of two fractional values is given by: a.b * c.d = a+c.b+d 5.7.3 Supported formats
Table 5.8 shows the data formats for multiply-accumulate operations supported by the ST20-C1.
Description Signed 16-bit fractional Signed 16-bit fractional Signed 16-bit fractional Name Q14 Q15 Q31 Format 14 significant fr actional bits 15 significant fr actional bits 31 significant fr actional bits
Table 5.8 Supported multiply accumulate data formats Note that a Q15 value may be (optimally) stored in a 16-bit field, or a wider (>16-bit) field with redundant sign bits. The two storage methods are described below. 5.7.4 Q15 in 16 bits
Q15 format is a 16-bit signed fractional value in the range -1 n < 1. The value is stored in two's-complement form with a sign bit (bit 15), an implied binary point between bits 15 and 14, and 15 significant fractional bits (bit 14 to bit 0).
65/205
(R)
5.7 Data formats A Q15 value in a 16-bit field is organiz ed as shown in Figure 5.7.
Most significant Bit position 15 -(20) 14 2-1 13 2-2 12 2-3 Least significant 1 2-14 0 2-15
binary point sign bit
Figure 5.7 Q15 data format (16-bits) The minimum and maximum representable values are as shown in Table 5.9.
Hex Value Minimum Maximum 8000 7FFF Decimal Value -1 +0.9999
Table 5.9 Q15 limiting values 5.7.5 Q15 in an oversized field
When a Q15 value is stored in an oversized field, e .g.a 32-bit register, the significant bits are placed at the least significant end of the field, and the v alue is sign-extended to the width of the whole field. The value still has the same range (-1 n < 1) and the same number of significant bits. A Q15 value in a 32-bit field is organiz ed as shown in Figure 5.8.
Most significant Bit position 31 30 29 16 15 14 2-1 Least significant 2 2-13 1 2-14 0 2-15
-(20) -(20) -(20)
-(20) -(20)
sign extended sign bit
binary point Basic Q15 value
Figure 5.8 Q15 data format (32-bits) The minimum and maximum representable values are as shown in Table 5.10.
66/205
(R)
5 Multiply accumulate
Hex Value Minimum Maximum FFFF8000 00007FFF
Decimal Value -1 +0.9999
Table 5.10 Q15 limiting values Memory access A Q15 stored in memory may be loaded to Areg with a lsxinc instruction which will automatically sign-extend the value. Similarly a Q15 may be written to memory with a ssinc instruction (which will discard the top 16 bits). Saturation A well-formed Q15 value is sign extended to the width of the field, and so bits 15 to 31 inclusive will all be identical. Conversely, if the bits from 15 to 31 inclusive are not all identical, then the value is not well-formed. It has either overflowed (if b31=0, so positive) or underflo wed (if bit31=1, so negative). A Q15 value in Areg may be saturated with the sequence:
ldc 00007FFF; order; ldc FFFF8000; order; rev;
5.7.6 Q31 format
Q31 format is a 32-bit signed fractional value in the range -1 n < 1. The value is stored in two's-complement form with a sign bit (bit 31), an implied binary point between bits 31 and 30, and 31 significant fractional bits (bit 30 to bit 0). A Q31 value in a 32-bit field is organiz ed as shown in Figure 5.9.
Most significant Bit position 31 -(20) 30 2-1 29 2-2 28 2-3 27 2-4 26 2-5 Least significant 2 2-29 1 2-30 0 2-31
binary point sign bit
Figure 5.9 Q31 data format The minimum and maximum representable values are as shown in Table 5.11.
67/205
(R)
5.7 Data formats
Hex Value Minimum Maximum 80000000 7FFFFFFF
Decimal Value -1 +0.99999999
Table 5.11 Q31 limiting values
68/205
(R)
5 Multiply accumulate
69/205
(R)
6
Exceptions
This chapter describes exceptions and how to use them. The architecture of the ST20-C1, including the registers and memory arrangement, are described in Chapter 3. A full list of constants and data structures is given in Appendix A. An exception is an exceptional event detected by the ST20-C1 core, which causes a context switch from the normal flow of an executing program. The event triggering the exception may be generated by software inside the core, in which case it is called a trap. Otherwise, the event may be a hardware signal from outside the core, in which case it is called an interrupt, except that the interrupt may attempt to perform a scheduling action, which may cause a trap. When an exception occurs the CPU changes context to an exception handler, which is a section of code only executed when an exception occurs. The process state registers (Areg, Breg, Creg, Iptr, Wptr, Status and Tdesc) are saved while the exception handler is running, and restored when it returns. Exception handlers for traps are called trap handlers and exception handlers for interrupts are called interrupt handlers. Normal processes which are not exception handlers are known as user processes. The exception handler is a transient process. Each exception handler starts execution with a standard initial state, runs to completion and terminates with an empty workspace. When the triggering event occurs again, the handler is restarted from its standard initial state and again runs to completion and terminates. Exception handlers may be nested to arbitrary depth, but they are not re-entrant, so care should be taken to ensure that the exception which caused the handler to run cannot occur while the handler is running, trapped or interrupted. The nesting of exceptions is illustrated in Figure 6.1.
User process Exception 1 taken - state of user process saved Exception 1 executing Exception 2 taken - state of exception 1 saved Exception 2 executing Exception 2 returns - state of exception 1 restored Exception 1 executing User process Exception 1 returns - state of user process restored
Figure 6.1 Nested exceptions
70/205
(R)
6 Exceptions Exception handler code is completed by executing the eret instruction, which restores the state of the interrupted or trapped process. When an interrupt handler executes eret, it also signals to the interrupt controller that the interrupt has completed. This allows the interrupt handler to start a lower priority waiting interrupt if required. The exception instructions are listed in Table 6.1.
Mnemonic Name exception call exception return breakpoint
ecall eret breakpoint
Table 6.1 Exception instructions
6.1
Exception levels
All exception handlers are identified by an integer called the exception level. Exception levels 0 to HighestException (255) are available for user-defined exceptions, while system exceptions have negative levels, as defined in Table 6.2.
Exception level 0 - 255 -1 -2 -3 -4 -5 -6 -7 el_breakpoint_trap el_illegal_instr_trap el_idle_trap el_schedule_exception_trap el_run_trap el_stop_trap el_timeslice_trap Name Circumstances when taken if not null Interrupt, system call, DMA user process.
breakpoint instruction executed or DCU breakpoint request.
Illegal op-code encountered. CPU becomes idle. Schedule a user process as an exception. Execute a run instruction. Execute a stop instruction. Take a timeslice.
Table 6.2 Exception levels Any exception may be triggered from software with the ecall instruction. User-defined exceptions can be interrupt handlers, user processes waiting for DMA peripherals or trap handlers used as system calls by executing ecall. System exceptions are traps which may be triggered automatically when the CPU is in certain states. They are intended mainly for operating system kernels to trap scheduling events and for debuggers to trap breakpoints. The circumstances in which each system trap is taken are as follows: * el_breakpoint_trap This trap is taken when either a breakpoint instruction is executed or a diagnostic controller (DCU) signals to the CPU requesting a breakpoint. If the trap is null then the process continues. This trap is used by debuggers.
71/205
(R)
6.2 Exception vector table * el_illegal_instr_trap This trap is taken when the CPU encounters an instruction with an illegal opcode. If the trap is null then the instruction is treated as a nop. * el_idle_trap This trap is taken when the CPU becomes idle, i.e. the current process executes a stop when there are no active processes waiting for CPU time, or a timeslice is trapped or interrupted so that there are no active processes waiting when the CPU attempts to start the next process. If the trap is null then the CPU waits for an interrupt or for a process to be scheduled. This trap is used by software scheduling kernels. * el_schedule_exception_trap This trap is taken when an interrupt is received from a peripheral and the exception level is assigned to a user process, which will generally be descheduled waiting for the peripheral to complete a job. If the trap is null then the user process is queued. This trap is used by software scheduling kernels. * el_run_trap This trap is taken when the run instruction is executed. If the trap is null then the CPU adds the process to the back of the scheduling queue. This trap is used by software scheduling kernels. * el_stop_trap This trap is taken when the stop instruction is executed. If the trap is null then the current process is descheduled and the CPU starts executing the process on the front of the scheduling queue, or goes idle if there is none. This trap is used by software scheduling kernels. * el_timeslice_trap This trap is taken when a timeslice is due and enabled and the timeslice instruction is executed. If the trap is null then the current process is timesliced, i.e. placed on the back of the scheduling queue. This trap is used by software scheduling kernels. Scheduling and timeslices are discussed in Chapter 7. Using user processes as exceptions to handle peripherals is described in section 4.11.
6.2
Exception vector table
The exception vector table maps each exception level to a user process or exception. The base of the exception vector table is at the fixed address ExceptionBase (#80000040) in on-chip memory, and the exception level is the word offset from the ExceptionBase to the vector. Thus the exception level is used as an index into the exception vector table. The address of the user process or exception is always word aligned, and so has bits 0 and 1 set to zero. The entry in the exception vector table is a descriptor, which consists of the address of the user process or exception ORed with a type. The type is
72/205
(R)
6 Exceptions bit 0 of the descriptor, and so can be either ExceptionProcessType (which has the value 1) or UserProcessType (which has the value 0). The type allows each exception level to be assigned to one of the following: * an exception handler. The entry for an exception handler is the address of the exception control block (as described in section 6.3) bitwise ORed with ExceptionProcessType to indicate an exception. a user process waiting for a peripheral (as described in section 4.11).The entry for a user process is the task descriptor (i.e. the address of the process control block) bitwise ORed with UserProcessType to indicate a user process. the null entry NotProcess. The CPU treats null entries as disabled exceptions.
*
*
When the exception is triggered, the CPU looks in the table entry for the requested exception level. If the value in the table is NotProcess then no exception or trap is taken and the CPU continues with the default behavior. Otherwise, if bit 0 is UserProcessType then the address part of the value is assumed to be a valid task descriptor of a process. If bit 0 is ExceptionProcessType, then the address part is assumed to be a pointer to a valid exception control block. Using user processes as exceptions to handle peripherals is described in section 4.11. The rest of this chapter refers only to exceptions. Typically a separate descriptor is used for each interrupt level and system call, so that this scheme provides a system of vectored interrupts and system calls. The exception handler is then executed, and can itself be interrupted or trapped.
6.3
Exception control block and the saved state
When an exception is taken, the state is saved in the exception control block. The evaluation stack, status register, Iptr, Wptr and Tdesc are automatically saved on taking the exception and restored on returning. This is done to enable any exception handler to have direct access to the state of the underlying process and is required for some software scheduler implementations. The constants in Table 6.3 define the locations in the control block.
Word offset 7 6 5 4 3 2 1 0 Name ex.HandlerIptr ex.InterptdStatus ex.InterptdTdesc ex.InterptdIptr ex.InterptdWptr ex.InterptdCreg ex.InterptdBreg ex.InterptdAreg Purpose Exception handler instruction pointer. Interrupted or trapped process status register. Interrupted or trapped process task descriptor. Interrupted or trapped process instruction pointer. Interrupted or trapped process workspace pointer. Interrupted or trapped process Creg. Interrupted or trapped process Breg. Interrupted or trapped process Areg.
Table 6.3 Exception control block
73/205
(R)
6.4 Initial exception handler state The control block is at the initial Wptr for the exception handler, so the locations are word offsets from the initial Wptr of the exception handler. The initial Wptr is the address of the exception control block which is the address part of the entry in the exception vector table.
6.4
Initial exception handler state
When the exception handler starts, Wptr is set to the address of the exception control block. The work space of the exception handler is normally below the control block, so like a function or procedure call, one of the first actions of the exception handler code is to adjust the Wptr downwards to create space for local variables using ajw. The Wptr must be adjusted back up again before the handler returns. The initial Iptr of the exception handler is the value loaded from ex.HandlerIptr in the exception control block. The state of the interrupted or trapped process is saved in the exception control block. If the exception is an idle trap then the saved state is the state of the last descheduled process. Initially the status register is set to the values shown in Table 6.4.
Field or bit mac_count mac_buffer mac_scale mac_mode global_interrupt_enable local_interrupt_enable overflo w underflo w carry user_mode interrupt_mode trap_mode sleep reserved start_next_task timeslice_enable timeslice_count Value As in the interrupted or trapped process. As in the interrupted or trapped process. As in the interrupted or trapped process. As in the interrupted or trapped process. False if exception is a trap, otherwise preserved. As in the interrupted or trapped process. False. False. False. False. True if any interrupts are running, false otherwise. True if exception is a trap, false otherwise. False. Undefined. False. False. As in the interrupted or trapped process.
Table 6.4 Exception handler initial status register If the exception handler is interrupting a user process, then the address of the exception control block (i.e. the user process state) is left in Tdesc. If necessary, the exception handler can save this address. If a sequence of nested interrupts has
74/205
(R)
6 Exceptions occurred then this is the only way that the nested interrupts can identify the state of the user process. If the exception is a schedule_exception trap then the trap handler also needs to know the process due to be scheduled. The descriptor of the process (as held in the exception vector table) is saved at SavedTaskDescriptor near the bottom of the address space.
6.5
Restrictions on exception handlers
Exception handlers cannot be queued, so they must not deschedule. This means that the following are not permitted inside exception handlers: * * the stop instruction; the timeslice instruction.
Exception handlers may be nested to arbitrary depth, but they are not re-entrant, so care should be taken to ensure that the exception which caused the handler to run cannot occur while the handler is running, trapped or interrupted.
6.6
Interrupts
All interrupts, whether from on-chip peripherals or external pins, are routed through an interrupt controller, which is normally an on-chip peripheral. The interrupt controller is responsible for arbitration between multiple interrupt signals. The design of the interrupt controller varies between ST20 variants. Typically, interrupt priorities are managed by the interrupt controller, which will usually track the priority of the highest level task currently executed by the core, and will interrupt the ST20-C1 again if a higher priority interrupt occurs. When an interrupt is requested by the interrupt controller, the ST20-C1 always changes context to the appropriate interrupt handler. An interrupt request is accompanied by an identifier for the interrupt handler, which is the exception level. The ST20C1 scheduler uses the exception level from the interrupt controller to start the appropriate interrupt handler. The CPU also sets the interrupt_mode bit in the status register and clears the trap_mode and user_mode bits. The interrupt_mode bit indicates that an interrupt handler is running, though it may have been trapped. When the interrupt handler executes the eret instruction, the CPU signals to the interrupt controller that the handler has returned. This allows the interrupt controller to keep track of which interrupt handlers are running, so that it can start a low priority waiting interrupt when a higher priority handler completes. If the interrupt controller requests an interrupt level which has a null interrupt handler then the CPU signals to the controller that the interrupt has completed.
75/205
(R)
6.7 Traps
6.7
Traps
The ST20-C1 has a system of traps which act as software generated interrupts. Any level of exception can be called as a user exception using ecall. This mechanism can be used for system calls to an operating system. In addition some special exception levels are reserved for system use to provide for the trapping of breakpoints, scheduling events, illegal operations and the machine becoming idle. The reserved `system' exception levels are described in section 6.1. If a system trap event occurs or a trap is called at an exception level which has a null trap handler then the CPU ignores the trap and continues. When a non-null trap handler is started, the trap_mode bit of the status register is set and the user_mode bit is cleared. The interrupt_mode bit is not altered. The trap_mode bit indicates that a trap handler is currently executing, so it is cleared if the trap handler is interrupted.
6.8
Setting up the exception handler
To create an exception handler, an exception work space area must be created, with enough space for the exception handler's stack and 8 words at the top of the work space for the exception control block, as defined in Table 6.3. For minimum interrupt latency, interrupt handler control blocks should be in fast memory, preferably on-chip. The normal work space and control block of an exception handler is shown in Figure 6.2. In the control block, ex.HandlerIptr must be initialized to point to the entry point of the exception handler code.
Exception Handler ex.HandlerIptr Interrupted or trapped state (7 words) Pointer OR 1 Initial exception handler Wptr Exception handler code entry point
Exception level (word offset)
Exception vector table ExceptionBase
Work space
Figure 6.2 Exception handler
76/205
(R)
6 Exceptions The code of the exception handler may access or modify the state of the interrupted or trapped process. The state of the interrupted or trapped process is stored in the exception control block, which is located at the initial Wptr of the exception handler. An exception handler must use eret to return to the interrupted or trapped process. 6.8.1 Enabling and disabling exceptions
When the exception handler has been created and initialized, the exception can be enabled. The address of the exception handler control block, ORed with 1 to set the exception type bit, must be written in the exception vector table at the level for the exception. To write an entry control_block with type ExceptionProcessType in the exception vector table, the following code may be used:
ld control_block; ldc ExceptionProcessType; or; ld ExceptionBase; stnl exception_level;
For interrupts, both the global_interrupt_enable and local_interrupt_enable bits in the status register must be set. In addition the interrupt controller may need to be initialized, including any interrupt enable bits and masks. A trap can be disabled by writing NotProcess into the exception vector table. Interrupts can be disabled in four ways: 1 2 3 4 Clearing the status register bit global_interrupt_enable disables all interrupts until the bit is set by an explicit write to the status register. Clearing the status register bit local_interrupt_enable disables all interrupts until the current process is descheduled. A single exception level can be disabled by writing NotProcess into the exception vector table. The interrupt controller will generally have a means of disabling interrupts individually or globally by writing to interrupt controller registers.
77/205
(R)
7.1 Processes
7
Multi-tasking
This chapter describes the features of the ST20-C1 core provided to support multitasking, and how to use them. The architecture of the ST20-C1, including the registers and memory arrangement, are described in Chapter 3. Interrupts and traps are described in Chapter 6. A full list of constants and data structures is given in Appendix A. Support is provided in the instruction set for timeslicing, scheduling processes and manipulating queues of processes.
7.1
Processes
A process (also known as a task or thread) is an independent unit of software with a single thread of control, i.e. a sequential algorithm. Any number of processes may be run. A process which has been started but not terminated may be in one of several different states: * * * *
executing on the CPU; interrupted or trapped by an exception; inactive, i.e. waiting for a peripheral or semaphore signal;
waiting for CPU time.
A process that is not inactive is said to be active. A process that is not executing or interrupted is said to be descheduled. The states and main transitions are shown in Figure 7.1. The scheduling transitions can be trapped so that a software scheduling kernel can modify the transitions and so change the scheduling behavior, for example by providing a system of process priorities.
Run Not started
Waiting for CPU time
Terminate Executing Timeslice Descheduled Interrupt or trap
Terminated
Descheduled processes
Active processes
Event
Return
Inactive
Interrupted or trapped
Figure 7.1 Process states and main transitions
78/205
(R)
7 Multi-tasking A process state is held in memory and the CPU registers. Sufficient of the register state must be saved when the process is interrupted or a context switch occurs, so that the process can be reloaded and continue execution at a later time. The register state consists of: * * * * * the instruction pointer register; the work space pointer register; the task descriptor register; the evaluation stack registers; the status register.
In order to save memory space and context switch time, processes are only descheduled when the evaluation stack and status register are empty. This is achieved by only allowing processes to deschedule at certain instructions, called deschedule instructions, after which the final values in the evaluation stack are undefined and the status register is reset to a default value. The deschedule instructions are stop and timeslice. Table 7.2 lists the multi-tasking instructions.
Mnemonic Name Run process Stop process Timeslice Load task descriptor Enqueue a process Dequeue a process
run stop timeslice ldtdesc enqueue dequeue
Table 7.2 Multi-tasking instructions
7.2
Descheduled processes
If the process is waiting for a peripheral or a semaphore or descheduled by a timeslice then the evaluation stack is not saved. The instruction pointer and Wptr are saved in the process descriptor block. The task descriptor is the address of the process descriptor block. It therefore identifies the waiting process and points to its saved state. The task descriptor is a fixed address for each process, unlike the Wptr which changes as the code executes. When the process is running, the task descriptor is held in the Tdesc register.
Word offset 2 1 0 Slot name pw.Iptr pw.Wptr pw.Link Purpose The process saved instruction pointer. The process saved work space pointer. The link to the next process in the queue.
Table 7.1 Process descriptor block
79/205
(R)
7.3 Queues The structure of the process descriptor block is shown in Table 7.1. When the process is not executing, it contains the saved work space pointer and instruction pointer of the process, plus a queue link if the process is in a queue. Figure 7.3 illustrates a descheduled process.
Process descriptor block Iptr Work space pointer Link
Process local work space (stack)
Code
Next process Task descriptor
Figure 7.3 A descheduled process
7.3
Queues
There may be any number of processes waiting for execution, so a queue (i.e. a linked list) of waiting processes is formed, called the scheduling queue. This is an example of a general queue supported by the instruction set for queueing waiting processes. A queue is a linked list of process control blocks, formed by links included in the process control blocks. Each link points to the control block of the next process in the queue unless it is the last in the queue, which is undefined. The front and back pointers of a queue are held in a queue control block, as shown in Table 7.2. The queue control block is held in memory, and the address of the block is the identifier of the queue .
Word offset 1 0 Slot name q.BPtrLoc q.FPtrLoc Purpose The back of the queue. The front of the queue.
Table 7.2 Queue control block A complete queue is illustrated in Figure 7.4. In the case of the scheduling queue, the control block is stored at the reserved address called SchedulerQptr (which has the
80/205
(R)
7 Multi-tasking value MostNeg) at the bottom of the memory space.
Queue control block Back Front Front Iptr Wptr Iptr Wptr Iptr Wptr Back Iptr Wptr
Process descriptor blocks
Figure 7.4 A process queue
7.4
Timeslicing
The ST20-C1 includes support for timeslicing. Timeslicing is a safeguard in a multitasking environment to prevent any one process from taking too much processor time. Exception handlers are not timesliced. A timeslicing counter is provided as a field of the status register called the timeslice_count. It is reset to MaxTimesliceCount each time a user process is loaded from the scheduling queue into the CPU. The counter is decremented regularly until it reaches 0, and then stays at 0. A timeslice is due when the counter value is 0. If a timeslice instruction is executed when a timeslice is due and timeslicing is enabled then: * * the timeslice trap will be taken if it is installed; otherwise the current process will be timesliced, i.e. the current process is placed on the back of the scheduling queue and the process on the front of the scheduling queue is loaded into the CPU for execution.
If an exception occurs, the value of the counter is saved with the status register and reloaded when the interrupted or trapped process is restarted. This ensures that a process will be executed for roughly the same time regardless of whether it was interrupted or trapped. The timeslice_enable bit of the status register can be used to enable or disable timeslicing. Timeslicing is enabled when the bit is set. This bit is preserved when a process is descheduled, so it may be treated as global among user processes. Timeslicing must not be enabled in exception handlers.
81/205
(R)
7.5 Inactive processes
7.5
Inactive processes
An inactive process is a process that is waiting for some event to occur, such as a process executing a run instruction or a peripheral completing a DMA. An inactive process cannot continue even if the CPU is idle. Inactive processes are not polled, but should be rescheduled by the event for which they are waiting. A process becomes inactive by saving its own state and then executing the stop instruction. stop automatically saves the Iptr and Wptr in the process control block. The code should save the Tdesc in an appropriate place where the awaited event can find it. The Tdesc points to the process control block. The Iptr is loaded onto the evaluation stack using ldpi, the Wptr using ldlp 0 and the Tdesc using ldtdesc. For example:
ldtdesc; ld tdesc_save_address; stnl 0; stop;
Two mechanisms are provided for rescheduling inactive processes: * Executing the run instruction. This allows other processes to reschedule an inactive process. For example, this is used when semaphore signalling to a waiting process and could be used by other communications, possibly implemented by an operating system. Scheduling an exception. This allows external devices to reschedule an inactive process. For example, this is used by DMA peripherals to wake a process waiting for the DMA to complete. Exception scheduling is described in section 4.11.
*
The run instruction takes a task descriptor in Areg and adds the process control block to the back of the scheduling queue. For example:
ld task_descriptor ; run;
Inactive processes may need to be queued, for example while waiting for a semaphore. The semaphore queue handling is incorporated into the signal and wait instructions, but for other queues the instructions enqueue and dequeue are needed. Enqueue will add a process to the back of an arbitrary queue, and dequeue will take a process from the front of the queue.
7.6
Descheduled process state
When a process restarts after an interrupt or trap, the entire state is loaded from the saved state. However, when the process starts or restarts after being descheduled, the CPU makes assumptions about the state of the process, since not all the state was saved. Wptr and Iptr are set to the values saved in pw.Wptr and pw.Iptr of the process descriptor block. The status register is set to the values shown in Table 7.3. The global interrupt enable and timeslice enable are global status bits, and are carried over from one process to the next when a context switch occurs.
82/205
(R)
7 Multi-tasking
Field or bit mac_count mac_buffer mac_scale mac_mode global_interrupt_enable local_interrupt_enable overflo w underflo w carry user_mode interrupt_mode trap_mode sleep reserved start_next_task timeslice_enable timeslice_count
Value As in previous process. As in previous process. As in previous process. As in previous process. As in previous process. Set. False. False. False. True. False. False. False. Undefined. False. As in previous process. MaxTimesliceCount.
Table 7.3 Restarted process status register
7.7
7.7.1
Initializing multi-tasking
Initializing the scheduling queue
Before multi-tasking operations can be performed, the scheduling queue must be initialized. The scheduling queue is described in section 7.3. The queue must be initialized by setting it to empty, which means setting the front pointer to empty:
ld MostNeg; ld SchedulerQptr; stnl q.FPtrLoc;
7.7.2 Creating and starting a process
A process consists of code, a work space area, a process control block and the values for the Iptr, Wptr and Tdesc. To create a process, a large enough work space is created, and a process control block of three words. It may be convenient to put all the process control blocks together in one area of memory. For fast context switches and minimal interrupt latency, process control blocks should be in on-chip memory. The entry point for the process code is written into pw.Iptr of the process control block. The address of the top of the work space is written into pw.Wptr of the process control block, so the process code must adjust the initial Wptr down using ajw to create space for local variables.
83/205
(R)
7.8 Scheduling kernels The following code would set up a process control block:
ld entry_point; ld control_block; stnl pw.Iptr; ld work_space_top; arot; stnl pw.Wptr;
The process is then inactive, so it can be added to the back of the scheduling queue using run, exactly as in section 7.5.
7.8
Scheduling kernels
A scheduling kernel can be written to override the default scheduling behavior of the ST20-C1, but still using the very fast micro-code scheduling. The basis of such a scheduler would be trap handlers to trap the system exceptions el_idle_trap, el_run_trap, el_stop_trap, el_timeslice_trap and el_schedule_exception_trap. When these traps are taken, a scheduling event is due or the processor has become idle. The trap replaces the default scheduling action. When the trap handler returns, the CPU will continue with the next instruction. Traps are described in Chapter 6. Additional system calls may be implemented by user-defined trap handlers, called using ecall.
7.9
Semaphores
Semaphores are a mechanism for managing access to shared resources within a multi-tasking environment. The semaphore operations are provided by the instructions listed in Table 7.4.
Mnemonic Name signal wait stop process run process
signal wait stop run
Table 7.4 Semaphore instructions The semaphore instructions act on a semaphore control block, defined in Table 7.5.
Word offset 2 1 0 Slot name s.Back s.Front s.Count Purpose The back of the semaphore waiting list. The front of the semaphore waiting list. The unsigned number of extraprocesses that the semaphore will allow to continue running on a wait request.
Table 7.5 Semaphore control block Each semaphore has a semaphore control block, which implements a linked list of waiting processes and a count of free resources. The count should be initialized to the total number of available resources (usually one), and the front list pointer should be initialized to the empty value NotProcess. For fast signalling and minimal interrupt latency, semaphore control blocks should be in fast memory, preferably on-chip.
84/205
(R)
7 Multi-tasking 7.9.1 Waiting for a resource
A process requesting a resource executes a wait (or P), which is the following code sequence:
ld semaphore; wait; cj CONTINUE; stop; CONTINUE:
The wait instruction is executed, with a pointer to the semaphore control block in Areg. The action of wait depends on the count in the semaphore control block. If the count is not zero, then a resource is free, so the count of free resources is decremented and the value false is left in the Areg to indicate that the process can continue. If the count is zero then there are no free resources, and the process is added to the list of waiting processes, and the value true is left in Areg to indicate that the process must wait. A conditional jump then tests Areg and performs a stop if the process must wait. stop saves the Iptr and Wptr and deschedules the process, which will wait on the semaphore queue until another process performs a signal and subsequently restarts the process with a run instruction. 7.9.2 Freeing a resource
When a process finishes with a resource and can free it, it performs a signal (or V), which is the following code sequence:
ld semaphore; signal; cj CONTINUE; run; CONTINUE:
The signal instruction is executed, with a pointer to the semaphore control block in Areg. The action of signal depends on whether a process is waiting or not. If the front pointer of the process waiting list is empty then there are no processes waiting, so the count is incremented and Areg is set to false. Otherwise, at least one process is waiting, so the first process is remo ved from the list and placed in the Breg, and Areg is set to true. A conditional jump tests Areg and performs a run to restart the process if there was one waiting.
7.10 Sleep
When the ST20-C1 becomes idle it disables counter distribution to its circuits and consumes a very small amount of electrical power; this is known as sleep mode. The counters are re-enabled and normal operation resumes automatically and instantly when either an interrupt or a software reset from the diagnostic controller is received. Sleep mode may also be triggered directly from software by setting the sleep bit in the status register:
ldc sleep; bitmask; statusset;
In this case also, sleep mode persists until the next interrupt or software reset.
85/205
(R)
7.10 Sleep
86/205
(R)
8 Instruction Set Reference
8
Instruction Set Reference
The following pages define the actions of each instruction in the ST20-C1 instruction set. The notation used is described in Chapter 2. The use of the instructions is described in Chapter 4, Chapter 6 and Chapter 7. The constants and data structures are listed in Appendix A.
87/205
(R)
adc n
Code: Function 8 Description: Add a constant to Areg. Definition: if (sum > MostPos) { Areg sum - 2BitsPerWord if (Statusunderflo w set ) Statusoverflow set } else if (sum < MostNeg) { Areg sum + 2BitsPerWord if (Statusoverflow set ) Statusunderflo w set } else { Areg sum } where
add constant
sum = Areg + n - the value of sum is calculated to unlimited precision
Status Register: Overflow or underflow bit may be set. Comments: Primary instruction. See also: add addc ldnlp section 4.4.
88/205
(R)
8 Instruction Set Reference
add
Code: F4 Description: Add Areg and Breg. Definition: if (sum > MostPos) { Areg sum - 2BitsPerWord if (Statusunderflo w set ) Statusoverflow set } else if (sum < MostNeg) { Areg sum + 2BitsPerWord if (Statusoverflow set ) Statusunderflo w set } else { Areg sum } Breg Creg where Creg Areg sum = Areg + Breg - the value of sum is calculated to unlimited precision
add
Status Register: Overflow or underflow bit may be set. Comments: Secondary instruction. See also: adc addc section 4.4.
89/205
(R)
addc
Code: 21 F0
add with carry
Description: Add Areg and Breg, unsigned, with carry propagation. This instruction is provided for long arithmetic; address calculations may be performed with add, sub and adc without affecting the carry flag. Definition: if (sum < 2BitsPerWord) { Aregunsigned Statuscarry } else { Aregunsigned Statuscarry } Breg Creg where Creg Areg sum = Aregunsigned + Bregunsigned + Statuscarry - the value of sum is calculated to unlimited precision
sum clear
sum - 2BitsPerWord set
Status Register: Carry bit is set or cleared. Comments: Secondary instruction. See also: adc add subc section 4.4.
90/205
(R)
8 Instruction Set Reference
ajw n
Code: Function B
adjust work space
Description: Move the workspace (stack) pointer by the number of words specified in the operand, in order to allocate or de-allocate workspace stack slots. Definition: Wptr Wptr @ n Status Register: No effect Comments: Primary instruction. See also: fcall gajw section 4.10.
91/205
(R)
and
Code: F9 Description: Bitwise AND of Areg and Breg. Definition: Areg Breg Areg Breg Creg Creg Areg
and
Status Register: No effect Comments: Secondary instruction. See also: not or xor section 4.8.
92/205
(R)
8 Instruction Set Reference
arot
Code: F3
anti-rotate stack
Description: Rotate the evaluation stack downwards. This instruction may be used to recover values rotated onto the bottom of the stack, e.g. by cj. Definition: Areg Creg Breg Areg Creg Breg Status Register: No effect Comments: Secondary instruction. See also: dup rev rot section 4.1.
93/205
(R)
ashr
Code: 21 FF
arithmetic shift right
Description: Perform an arithmetic shift right of Breg by Areg bits, copying the sign bit into the vacated bits. Breg, not Areg, is rotated into Creg, to preserve the value rather than the shift length. Definition: Areg (Breg >>arith Aregunsigned) Breg Creg Creg Breg
Status Register: No effect Comments: Secondary instruction. If Areg is not in the range 0..31 then the result is undefined. See also: shl shr section 4.9.
94/205
(R)
8 Instruction Set Reference
biquad
Code: 21 F7
biquad IIR filter step
Description: Execute a step of a biquad IIR filter on vectors of 16-bit values. Areg points to the C coefficient vector, Breg points to the X input data vector and Creg points to the Y results vector. Areg must be word-aligned and Breg and Creg halfword aligned. biquad increments Breg and Creg by 2 bytes and performs the five multiply accumulates Y[2] = X[0].C[0] + X[1].C[1] + X[2].C[2] + Y[0].C[3] + Y[1].C[4]. Definition: if (overflow) if (Statusunderflo w set ) Statusoverflowset else if (underflo w) if (Statusoverflow set ) Statusunderflo wset else { sixteen[Creg + 4] acc >>arith23 Breg Breg + 2 Creg Creg + 2 } where acc = 1 << 22 + ((sixteen[Areg] << shift) x sixteen[Breg]) + ((sixteen[Areg+2] << shift) x sixteen[Breg+2]) + ((sixteen[Areg+4] << shift) x sixteen[Breg+4]) + ((sixteen[Areg+6] << shift) x sixteen[Creg]) + ((sixteen[Areg+8] << shift) x sixteen[Creg+2]) - the value of acc is calculated to 48-bit precision if (Statusmac_scale = 3) shift = 9 else shift = 4 x Statusmac_scale Status Register: May set underflow or overflow. Comments: Areg must be word-aligned. Breg and Creg must be half-word aligned, and must either be both word-aligned or neither word-aligned. This instruction may take 8 memory accesses, which will affect interrupt latency. See also: mac, smacloop, umac Chapter 5.
95/205
(R)
bitld
Code: 22 F3
load bit
Description: Get a bit of Breg. The bit number is given by Areg. Breg is rotated into Creg, to preserve the value rather than the bit number. Definition: Areg (Breg >> Areg) 1 Breg Creg Creg Breg
Status Register: No effect Comments: Secondary instruction. Areg must be in the range 0..31. See also: bitst bitmask section 4.8.
96/205
(R)
8 Instruction Set Reference
bitmask
Code: 22 F5
create bit mask
Description: Create a bit mask with a single bit set. The value in Areg indicates the bit that is to be set. Definition: Areg 1 << Areg Status Register: No effect Comments: Secondary instruction. Areg must be in the range 0..31. See also: bitld section 4.8.
97/205
(R)
bitst
Code: 22 F4
store bit
Description: Overwrite the bit position Areg of the value in Creg with a single bit with the value given by Breg. Breg is rotated into Creg, to preserve the value rather than the bit number. Definition: Areg (Creg ~(1 << Areg)) (Breg << Areg) Breg Creg Creg Breg
Status Register: No effect Comments: Secondary instruction. Areg must be in the range 0..31. Breg must be 0 or 1. See also: bitld bitmask section 4.8.
98/205
(R)
8 Instruction Set Reference
breakpoint
Code: FF Description: Take a breakpoint trap if a breakpoint trap is installed. Definition: if (breakpoint trap installed) take breakpoint trap
breakpoint
Status Register: If the trap is taken then the status register is saved and the trap handler status register is loaded. Comments: Secondary instruction. This instruction is a short secondar y instruction, encoded in a single byte. See also: Chapter 6.
99/205
(R)
cj n
Code: Function A
conditional jump
Description: Jump if Areg is 0 (i.e. jump if false). The destination of the jump is expressed as a byte offset from the instruction following the conditional jump. Definition: if (Areg = false) Iptr next instruction + n else { Areg Breg Breg Creg Creg Areg } Status Register: No effect Comments: Primary instruction. The initial Areg can be recovered using arot. See also: eqc gt gtu j jab section 4.6.
100/205
(R)
8 Instruction Set Reference
dequeue
Code: 23 F8
dequeue a process
Description: Load into Breg a pointer to the first process from the queue control block pointed to by Areg. Load into Areg a boolean value indicating whether a process was found. The queue control block is defined in Chapter 7. Definition: if (frontptr = NotProcess) -- queue is empty { Areg false Breg undefined } else -- queue is not empty { Areg true Breg frontptr if (frontptr = backptr) -- one process on queue word[Areg @ q.FPtrLoc] NotProcess else -- many processes on queue word[Areg @ q.FPtrLoc] word[ frontptr @ pw.Link ] } Creg undefined
where frontptr = word[Areg @ q.FPtrLoc] backptr = word[Areg @ q.BPtrLoc] Status Register: No effect Comments: Secondary instruction. This instruction may require 4 memory accesses. If the queue structure is in offchip memory then interrupt latency may be affected. Areg must be word aligned (i.e. divisible by 4). See also: enqueue run Chapter 7.
101/205
(R)
divstep
Code: 21 F8
divide step
Description: Perform a step of an integer division to generate 4 bits of the quotient. Areg holds the positive divisor. Breg and Creg hold the non-negative dividend or partial remainder and the partial result. Before the first step, Breg holds the dividend and Creg must be 0. After performing divstep 8 times, Breg will hold the quotient and Creg the remainder. Definition: if (Areg > 0 and Breg 0 and Creg 0 ) { Breg (Breg << 4) (part_div / Areg) Creg part_div rem Areg } else { Breg undefined Creg undefined } where part_div = (Creg << 4) (Breg >> 28) - the value of part_div is calculated to unlimited precision
Status Register: No effect. Comments: Secondary instruction. Areg must be greater than zero. Breg and Creg must be not less than zero. See also: unsign section 4.4.
102/205
(R)
8 Instruction Set Reference
dup
Code: F1 Description: Duplicate the top of the integer stack. Definition: Breg Areg Creg Breg Status Register: No effect Comments: Secondary instruction. See also: arot rev rot section 4.8.
duplicate top of stack
103/205
(R)
ecall
Code: 23 F1
exception call
Description: Take a trap with exception level indicated by Areg. This instruction can be used to implement system calls to an operating system. Definition: Areg Breg Breg Creg Creg Areg if (trap level Areg is installed) take trap level Areg Status Register: The status register is saved and a new status register with set values is loaded. Comments: Secondary instruction. Areg must be in the range LowestException to HighestException. The exception type must be ExceptionProcessType. See also: eret Chapter 6.
104/205
(R)
8 Instruction Set Reference
enqueue
Code: 23 F7
enqueue a process
Description: Add the process indicated by the process descriptor in Breg to the queue structure in Areg. The queue control block is defined in Chapter 7. Definition: if (word[Areg @ q.FPtrLoc] = NotProcess) { word[Areg @ q.FPtrLoc] Breg word[Areg @ q.BPtrLoc] Breg } else { word[word[Areg @ q.BPtrLoc] @ pw.Link] word[Areg @ q.BPtrLoc] Breg } Areg Breg Creg undefined undefined undefined
-- queue is empty
Breg
Status Register: No effect Comments: Secondary instruction. This instruction may require 4 memory accesses. If the queue structure is in offchip memory then interrupt latency may be affected. Areg must be word aligned (i.e. divisible by 4). See also: dequeue, stop Chapter 7.
105/205
(R)
eqc n
Code: Function C Description: Compare Areg to a constant. Definition: if (Areg = n) Areg true else Areg false Status Register: No effect Comments: Primary instruction. See also: cj gt gtu section 4.6.
equals constant
106/205
(R)
8 Instruction Set Reference
eret
Code: 23 F2 Description: Return from an interrupt or trap. Definition: if (Statustrap_mode = clear) signal interrupt return to interrupt controller Areg Breg Creg Wptr Iptr Tdesc Status word[Wptr @ word[Wptr @ word[Wptr @ word[Wptr @ word[Wptr @ word[Wptr @ word[Wptr @ ex.InterptdAreg] ex.InterptdBreg] ex.InterptdCreg] ex.InterptdWptr] ex.InterptdIptr] ex.InterptdTdesc] ex.InterptdStatus]
exception return
Status Register: Saved status register is restored. Comments: Secondary instruction. The interrupted Wptr must be word aligned. See also: ecall Chapter 6.
107/205
(R)
fcall n
Code: Function 9 Description: Call subroutine at the specified byte offset. Definition: word[Wptr @ 0] Iptr Iptr
function call
next instruction + n
Status Register: No effect Comments: Primary instruction. See also: ajw jab section 4.4.
108/205
(R)
8 Instruction Set Reference
gajw
Code: 23 FB
general adjust workspace
Description: Set the workspace pointer to the word aligned address in Areg, saving the previous value in Areg. Definition: Wptr Areg Areg Wptr Status Register: No effect Comments: Secondary instruction. Areg should be word aligned (i.e. divisible by 4). See also: ajw fcall section 4.4.
109/205
(R)
gt
Code: 21 FB
greater than
Description: Compare the top two elements of the stack, returning true if Breg is greater than Areg. Definition: if (Breg > Areg) Areg true else Areg false Breg Creg Creg Areg
Status Register: No effect Comments: Secondary instruction. See also: eqc gtu order section 4.6.
110/205
(R)
8 Instruction Set Reference
gtu
Code: 21 FC
greater than unsigned
Description: Compare the top two elements of the stack, treating both as unsigned integers, returning true if Breg is greater than Areg. Definition: if (Bregunsigned > Aregunsigned) Areg true else Areg false Breg Creg Creg Areg
Status Register: No effect Comments: Secondary instruction. See also: eqc gt orderu section 4.6.
111/205
(R)
io
Code: 23 FD
input/output
Description: Set, clear and test bits of the IO register. The bits set in the least significant half word of Breg are cleared in the IO register and then the bits set in the least significant half word of Areg are set in the IO register. The initial IO register is copied into the Areg. Definition: IOreg (IOreg (~Breg #FFFF0000)) (Areg #FFFF) Areg IOreg Breg Creg Creg Areg
Status Register: No effect Comments: Secondary instruction. Some bits of the IO register may be reserved in some variants. See the datasheet for the variant. See also: bitmask section 4.11.
112/205
(R)
8 Instruction Set Reference
jn
Code: Function 0
jump
Description: Unconditional relative jump. The destination of the jump is expressed as a byte offset from the first b yte after the current instruction. Definition: Iptr next instruction + n Status Register: No effect. Comments: Primary instruction. See also: cj jab section 4.6.
113/205
(R)
jab
Code: FD
jump absolute
Description: Jump to the address in Areg, saving the previous address in Creg. This instruction can be used for function and procedure returns and to jump to a run-time computed location. Definition: Iptr Areg Breg Creg Areg Breg Creg Iptr
Status Register: No effect Comments: Secondary instruction. See also: cj j section 4.4.
114/205
(R)
8 Instruction Set Reference
lbinc
Code: 22 FA
load byte and increment
Description: Load the unsigned byte addressed by Areg into Areg and increment the address by one byte. Definition: Areg0..7 byte[Areg] Areg8..BitsPerWord-1 0 Creg Areg + 1 Status Register: No effect Comments: Secondary instruction. See also: ldl ldnl lsinc lwinc sbinc section 4.2.
115/205
(R)
ldc n
Code: Function 4 Description: Load constant into Areg. Definition: Areg n Breg Creg Areg Breg
load constant
Status Register: No effect Comments: Primary instruction. See also: adc ldl section 4.2.
116/205
(R)
8 Instruction Set Reference
ldl n
Code: Function 7
load local
Description: Load into Areg the local variable at the specified word offset in workspace. ldl cannot read from addresses reserved for peripheral registers. Definition: Areg word[Wptr @ n] Breg Creg Areg Breg
Status Register: No effect Comments: Primary instruction. Areg must be word aligned (i.e. divisible by 4). See also: ldnl stl section 4.2.
117/205
(R)
ldlp n
Code: Function 1
load local pointer
Description: Load into Areg the address of the local variable at the specified offset in workspace. Definition: Areg Breg Creg Wptr @ n Areg Breg
Status Register: No effect Comments: Primary instruction. See also: ldl ldnlp section 4.5.
118/205
(R)
8 Instruction Set Reference
ldnl n
Code: Function 3
load non-local
Description: Load into Areg the non-local variable at the specified word offset from Areg. Definition: Areg word[Areg @ n] Status Register: No effect Comments: Primary instruction. Areg must be word aligned (i.e. divisible by 4). See also: ldl ldnlp stnl section 4.2.
119/205
(R)
ldnlp n
Code: Function 5
load non-local pointer
Description: Load into Areg the address at the specified word offset from the address in Areg. Definition: Areg Areg @ n Status Register: No effect Comments: Primary instruction. See also: ldlp ldnl wsub section 4.5.
120/205
(R)
8 Instruction Set Reference
ldpi
Code: 23 FA
load pointer to instruction
Description: Load into Areg an absolute address calculated from an offset relative to the current instruction pointer. Areg contains a byte offset which is added to the address of the first b yte following this instruction. Definition: Areg next instruction + Areg Status Register: No effect Comments: Secondary instruction. See also: jab section 4.5.
121/205
(R)
ldprodid
Code: 23 FC
load product identity
Description: Load a value indicating the product identity into Areg. Each product in the ST20 family has a unique product identity code. Definition: Areg ProductId Breg Creg Areg Breg
Status Register: No effect Comments: Secondary instruction. Different ST20 products may use the same processor type, but return different product identity codes. However a product identity code uniquely defines the processor type used in that product. For specific product identities in the ST20 family refer to SGS-THOMSON. See also: section 4.2.
122/205
(R)
8 Instruction Set Reference
ldtdesc
Code: 23 F9
load task descriptor
Description: Load the task descriptor of the current task. Definition: Areg Tdesc Breg Areg Creg Breg Status Register: No effect Comments: Secondary instruction. See also: ldlp ldpi Chapter 6.
123/205
(R)
lsinc
Code: 22 FC
load sixteen and increment
Description: Load the unsigned 16-bit object addressed by Areg into Areg and increment the address by two bytes. Definition: Areg0..15 sixteen[Areg] Areg16..BitsPerWord-1 0 Creg Areg + 2 Status Register: No effect Comments: Secondary instruction. Areg must be half-word aligned (i.e. divisible by 2). See also: lbinc ldl lsxinc lwinc ssinc section 4.2.
124/205
(R)
8 Instruction Set Reference
lsxinc
Code: 22 FD
load sixteen sign extended and increment
Description: Load the signed 16-bit object addressed by Areg into Areg, sign extend to a word, and increment the address by two bytes. Definition: Areg0..15 sixteen[Areg] Areg16..BitsPerWord-1 sixteen[Areg]15 Creg Areg + 2 Status Register: No effect Comments: Secondary instruction. Areg must be half-word aligned (i.e. divisible by 2). See also: ldl lsinc lwinc ssinc xsword section 4.2.
125/205
(R)
lwinc
Code: 22 FF
load word and increment
Description: Load a variable from the address given in Areg and increment the address by one word. This instruction may be used with stinc as a step in a block move, and Breg is not modified so that it can be used to hold the store address . Definition: Areg word[Areg] Creg Areg @ 1 Status Register: No effect Comments: Secondary instruction. Areg must be word aligned (i.e. divisible by 4). See also: lbinc lsinc swinc section 4.2.
126/205
(R)
8 Instruction Set Reference
mac
Code: 21 F2
multiply accumulate
Description: Multiply Areg and Breg, and add in Creg. The most significant word of the result is placed in Breg and the least significant in Areg. This can be used in forming the double length product of Areg and Breg, with Creg as carry in, treating the Areg and Breg values as signed but Creg as unsigned. Definition: Areg Breg Creg where low_word (accum64) high_word (accum64) Areg accum64 = (Breg x Areg) + Cregunsigned - the value of accum64 is calculated as a 64-bit value
Status Register: No effect Comments: Secondary instruction. See also: ldiv mul umac Chapter 5.
127/205
(R)
mul
Code: F6
multiply
Description: Multiply Areg by Breg, with checking for overflow but no carry. Definition: Areg low_word (prod) if (prod > MostPos) { if (Statusunderflo w set ) Statusoverflow set } else if (prod < MostNeg) { if (Statusoverflow set ) Statusunderflo w set } Breg Creg where Creg Areg prod = Areg x Breg - the value of prod is calculated to unlimited precision
Status Register: Overflow or underflow bit may be set. Comments: Secondary instruction. See also: add mac section 4.4.
128/205
(R)
8 Instruction Set Reference
nop
Code: 23 FF Description: Perform no operation. Definition: no change Status Register: No effect Comments: Secondary instruction.
no operation
129/205
(R)
not
Code: F8 Description: Complement bits in Areg. Definition: Areg Areg Status Register: No effect Comments: Secondary instruction. See also: and or xor section 4.8.
bitwise not
130/205
(R)
8 Instruction Set Reference
or
Code: FA Description: Bitwise OR of Areg and Breg. Definition: Areg Breg Areg Breg Creg Creg Areg
or
Status Register: No effect Comments: Secondary instruction. See also: and not xor section 4.8.
131/205
(R)
order
Code: 21 FD
order
Description: Order Areg and Breg into signed ascending order, so that Breg is greater than or equal to Areg. This can be used to implement maximum, minimum, absolute value and saturation at less than 32 bits. Definition: if (Areg > Breg) { Areg Breg Breg Areg } Status Register: No effect Comments: Secondary instruction. See also: gt orderu rev section 4.6.
132/205
(R)
8 Instruction Set Reference
orderu
Code: 21 FE
unsigned order
Description: Order Areg and Breg into unsigned ascending order, so that Breg is greater than or equal to Areg. This can be used to implement unsigned maximum and minimum. Definition: if (Aregunsigned > Bregunsigned) { Areg Breg Breg Areg } Status Register: No effect Comments: Secondary instruction. See also: gtu order rev section 4.6.
133/205
(R)
rev
Code: F0 Description: Swap the values in Areg and Breg. Definition: Areg Breg Breg Areg Status Register: No effect Comments: Secondary instruction. See also: dup order orderu rot section 4.8.
reverse
134/205
(R)
8 Instruction Set Reference
rmw
Code: 22 F9
memory read modify write
Description: Clear and set bits of the memory word at address Areg using the set mask in Breg and clear mask in Creg. Definition: word[Areg] (word[Areg] ~Creg) Breg Areg word[Areg] Breg Creg Areg Breg
Status Register: No effect Comments: Secondary instruction. Areg must be word aligned (i.e. divisible by 4). See also: bitmask section 4.8.
135/205
(R)
rot
Code: F2 Definition:Rotate the evaluation stack upwards. Definition: Areg Breg Breg Creg Creg Areg Status Register: No effect Comments: Secondary instruction. See also: arot dup rev section 4.8.
rotate stack
136/205
(R)
8 Instruction Set Reference
run
Code: 23 F3
run process
Description: Spawn a new process. Add the new process to the back of the process queue defined b y the control block at SchedulerQptr. The task descriptor of the process is in Areg; this identifies the process descr iptor block. The instruction pointer and work space pointer must have been written in the process descriptor block of the process at offsets pw.Iptr and pw.Wptr words. Definition: if (run trap handler installed) take run trap else { -- queue is empty if (word [SchedulerQptr @ q.FPtrLoc] = NotProcess) { word[SchedulerQptr @ q.FPtrLoc] Areg word[SchedulerQptr @ q.BPtrLoc] Areg } else { word[ word [SchedulerQptr @ q.BPtrLoc] @ pw.Link] Areg word[SchedulerQptr @ q.BPtrLoc] Areg } Areg Breg Creg } Status Register: No effect Comments: Secondary instruction. Areg must be word aligned (i.e. divisible by 4). This instruction may require up to three on-chip memory accesses and one offchip. This may affect interrupt latency. See also: dequeue enqueue signal stop Chapter 7. undefined undefined undefined
137/205
(R)
saturate
Code: 21 FA
saturate arithmetic result
Description: Saturate the value in Areg to MostPos or MostNeg if overflow or underflow respectively has occurred, and clear the overflow or underflow. Definition: if (Statusoverflow) Areg MostPos else if (Statusunderflo w) Areg MostNeg Statusoverflow clear Statusunderflo w clear Status Register: Overflow and underflow bits are cleared. Comments: Secondary instruction. See also: adc add mul smul sub section 4.4.
138/205
(R)
8 Instruction Set Reference
sbinc
Code: 22 FB
store byte and increment
Description: Store the least significant b yte of Breg into the byte of memory addressed by Areg and increment the address by one byte. Definition: byte[Areg] Breg0..7 Areg Breg Creg Creg Areg + 1 Breg
Status Register: No effect Comments: Secondary instruction. See also: lbinc ssinc swinc stnl section 4.2.
139/205
(R)
shl
Code: FB
shift left
Description: Logical shift of Breg left by Areg bits, filling with z ero bits. If the initial Areg is not between 0 and 31 inclusive then the result is undefined. Definition: Areg Breg << Areg Breg Creg Creg Breg
Status Register: No effect Comments: Secondary instruction. Areg must be in the range 0..31. See also: shr section 4.9.
140/205
(R)
8 Instruction Set Reference
shr
Code: FC
shift right
Description: Logical shift of Breg right by Areg places, filling with z ero bits. If the initial Areg is not between 0 and 31 inclusive then the result is undefined. Definition: Areg Breg >> Areg Breg Creg Creg Breg
Status Register: No effect Comments: Secondary instruction. Areg must be in the range 0..31. See also: ashr shl section 4.9.
141/205
(R)
signal
Code: 23 F5
signal
Description: Perform the first par t of a semaphore signal (or V) on the semaphore pointed to by Areg. If no process is waiting then the unsigned count is incremented. If processes are waiting then the first process is removed from the queue and its identifier placed in Breg ready to be run. The instruction returns a boolean value in Areg stating whether a run process is required. This instruction should be used in the following sequence: ld semptr; signal; cj continue; run; continue: Definition: -- queue empty if (frontptr = NotProcess) { word[Areg @ s.Count] word[Areg @ s.Count] + 1 Areg false Breg undefined } else -- queue not empty { if (frontptr = backptr) -- one process on queue word[Areg @ s.Front] NotProcess else -- many processes on queue word[Areg @ s.Front] word[ frontptr @ pw.Link ] Areg Breg } Creg undefined true frontptr
where frontptr = word[Areg @ s.Front] backptr = word[Areg @ s.Back] Status Register: No effect Comments: Secondary instruction. The increment of the count is unchecked. This instruction may require 4 accesses to memory. The semaphore structure should be placed in on-chip memory if interrupt latency is critical. See also: run wait Chapter 7.
142/205
(R)
8 Instruction Set Reference
smacinit
Code: 21 F5
initialize short multiply accumulate loop
Description: Set the loop count, buffer size, scaling and data format for the smacloop instruction. Definition: Statusmac_count Statusmac_buffer Statusmac_scale Statusmac_mode Areg Breg Creg Areg #FF (Areg >> 8) #7 (Areg >> 11) #3 (Areg >> 13) #1
Breg Creg 0
Status Register: No effect. Comments: Secondary instruction. See also: smacloop Chapter 5.
143/205
(R)
smacloop
Code: 21 F6
short multiply accumulate loop
Description: Multiply pairs of signed16-bit values and accumulate the products. On entry Areg and Breg contain the addresses of arrays X and Y of 16-bit values and Creg holds the initial 32-bit accumulator. On exit the accumulator is in Areg, while Breg and Creg contain the next addresses in the X and Y arrays respectively. The X values can be in a circular buffer whose address is exactly divisible by the size. The loop count, buffer size, mode and scaling codes, held in the status register, may be set by the smacinit instruction. Warning: This instruction is not interruptible. If interrupt latency is critical then a similar result may be achieved by executing the instruction many times with small counts. The result may not be the same, since rounding occurs at the end of the instruction from the internal 48-bit accumulator to a 32-bit result. Definition: if (Statusoverflow) Areg max else if (Statusunderflo w) Areg min else if (overflow) { Statusoverflow set Areg max } else if (underflow) { Statusunderflo w set Areg min } else { Areg acc >>arith shift1 } Breg Creg xaddress (Statusmac_count) Breg + (2 x Statusmac_count)
where acc = (Creg << shift1) + (1 << (shift1-1)) + i=0..n-1[sixteen[xaddress(i )] x (sixteen[Breg + 2.i] << shift2)] - the value of acc is calculated to 48-bit precision if (Statusmac_count = 0) n=256
144/205
(R)
8 Instruction Set Reference else n = Statusmac_count
if (Statusmac_buffer 0) { buff_size = 4 << Statusmac_buffer xaddress (i) = Areg - (Areg rem buff_size) + ((Areg + 2.i ) rem buff_size) } else xaddress(i) = Areg + (2 x i) if (Statusmac_mode = LongMode) { shift1 = 7 max = MostPos min = MostNeg } else { shift1 = 23 max = #7FFF min = #8000 } if (Statusmac_scale = 3) shift2 = 9 else shift2 = 4 x Statusmac_scale Status Register: Overflow or underflow may be set. Comments: Secondary instruction. Areg must be half-word aligned. Breg must be word aligned. See also: biquad mac mul smacinit Chapter 5.
145/205
(R)
smul
Code: 21 F4
short multiply
Description: Multiply Areg by Breg, treated as sixteen bit signed integers. Definition: Areg (int16)Areg x (int16)Breg Breg Creg Creg Areg
Status Register: No effect Comments: Secondary instruction. Areg and Breg must be in the range of 16-bit signed integers. See also: mul smac section 4.4.
146/205
(R)
8 Instruction Set Reference
ssinc
Code: 22 FE
store sixteen and increment
Description: Store the least significant 16 bits of Breg into the half word of memory addressed by Areg and increment the address by two bytes. Definition: sixteen[Areg] Areg Breg Creg Status Register: No effect Comments: Secondary instruction. Areg must be half-word aligned (i.e. divisible by 2). See also: lsinc lsxinc sbinc stl stnl swinc section 4.2. Breg0..15 Creg Areg + 2 Breg
147/205
(R)
statusclr
Code: 22 F7
clear bits in status register
Description: Clear the bits of the status register which are set in the Areg. The initial status register value is returned in the Areg. Definition: Areg Status Status Status ~Areg
Status Register: The bits of Areg which are set will be cleared in the Status Register. Comments: Secondary instruction. See also: statusset statustst section 4.12.
148/205
(R)
8 Instruction Set Reference
statusset
Code: 22 F6
set bits in status register
Description: Set the bits of the Status Register which are set in Areg. The initial Status Register value is returned in the Areg. Definition: Areg Status Status Status Areg
Status Register: The bits of Areg which are set will be set in the Status Register. Comments: Secondary instruction. See also: statusclr statustst section 4.12.
149/205
(R)
statustst
Code: 22 F8
test status register
Description: Test the bits of the Status Register which are set in the Areg. Definition: Areg Status Areg
Status Register: No effect Comments: Secondary instruction. See also: bitmask statusclr statusset section 4.12.
150/205
(R)
8 Instruction Set Reference
stl n
Code: Function D
store local
Description: Store the contents of Areg into the local variable at the specified w ord offset in workspace. This instruction cannot write to addresses reserved for peripheral registers. Definition: word[Wptr @ n] Areg Breg Creg Breg Creg Areg Areg
Status Register: No effect Comments: Primary instruction. See also: ldl sbinc ssinc stnl section 4.2.
151/205
(R)
stnl n
Code: Function E
store non-local
Description: Store the contents of Breg into the non-local variable at the specified word offset from Areg. Definition: word[Areg @ n] Areg Breg Creg Creg Areg Breg Breg
Status Register: No effect Comments: Primary instruction. Areg must be word aligned (i.e. divisible by 4). See also: ldnl sbinc ssinc swinc stl section 4.2.
152/205
(R)
8 Instruction Set Reference
stop
Code: 23 F4
stop process
Description: Terminate the current process, saving the current Iptr and Wptr for later use, and start the next process from the scheduling queue. If the scheduling queue is empty then the CPU will become idle. This instruction is implemented in two stages to protect interrupt latency. The first stage deschedules the current process and sets the start_next_task status bit. The second stage clears the start_next_task status bit and starts the next process. An interrupt may occur between the two stages. Definition: if (stop trap handler installed) take stop trap else { word[Tdesc @ pw.Iptr] word[Tdesc @ pw.Wptr] Statuslocal_interrupt_enable Statusoverflow Statusundeflo w Statuscarry Statusinterrupt_mode Statustrap_mode Statusstart_next_task Statustimeslice_count

next instruction Wptr set clear clear clear clear clear clear MaxTimesliceCount
-- queue is empty, so become idle
-- start next process on scheduling queue
if (frontptr = NotProcess) { Statussleep set Statususer_mode clear if (idle trap handler installed) take idle trap else set idle } else { Statussleep Statususer_mode Tdesc Wptr
-- processes waiting, so start next process

clear set frontptr word [frontptr @ pw.Wptr]
153/205
(R)
Iptr word [frontptr @ pw.Iptr] if (frontptr = backptr) --one process on queue, so make it empty word[SchedulerQptr @ q.FPtrLoc] NotProcess else --many processes, so remove first word[SchedulerQptr @ q.FPtrLoc] word[frontptr @ pw.Link] } Areg Breg Creg } where frontptr = word[SchedulerQptr @ q.FPtrLoc] backptr = word[SchedulerQptr @ q.BPtrLoc] Status Register: Global interrupt enable and timeslice enable are preserved. Start_next_task is set when the current process is descheduled and cleared when the new process is started. An interrupt may occur between these two stages. Other bits are set to the values listed above. Comments: Secondary instruction. Deschedules the current process. This instruction must not be used in an exception handler. See also: dequeue enqueue run wait Chapter 7. undefined undefined undefined
154/205
(R)
8 Instruction Set Reference
sub
Code: F5 Description: Subtract Areg from Breg, with checking for overflow. Definition: if (diff > MostPos) { Areg diff - 2BitsPerWord if (Statusunderflo w set ) Statusoverflow set } else if (diff < MostNeg) { Areg diff + 2BitsPerWord if (Statusoverflow set ) Statusunderflo w set } else { Areg diff } Breg Creg where Creg Areg diff = Breg - Areg - the value of diff is calculated to unlimited precision
subtract
Status Register: Overflow or underflow bit may be set. Comments: Secondary instruction. See also: add subc section 4.4.
155/205
(R)
subc
Code: 21 F1
subtract with carry
Description: Subtract Areg from Breg, unsigned with carry propagation. This instruction is provided for long arithmetic; address calculations may be performed with add, sub and adc without affecting the carry flag. Definition: if ( diff 0 ) { Aregunsigned Statuscarry } else { Aregunsigned Statuscarry } Breg Creg where Creg Areg diff = (Bregunsigned - Aregunsigned) - Statuscarry - the value of diff is calculated to unlimited precision
diff clear
diff + 2BitsPerWord set
Status Register: Carry bit is set or cleared. Comments: Secondary instruction. See also: addc sub section 4.4.
156/205
(R)
8 Instruction Set Reference
swap32
Code: 23 FE Description: Reverse the order of the bytes within Areg.
byte swap 32
Definition: Areg (Areg0..7 << 24) (Areg8..15 << 16) (Areg16..23 << 8) Areg24..31 Status Register: No effect Comments: Secondary instruction. See also: section 4.9.
157/205
(R)
swinc
Code: 23 F0
store word and increment
Description: Store the value from the Breg at the memory address given in the initial Areg and increment the address by one word. This instruction may be used with lwinc as a step in a block move. The value in Creg is copied into the Areg. Definition: word[Areg] Areg Breg Creg Status Register: No effect Comments: Areg must be word aligned (i.e. divisible by 4). Secondary instruction. See also: ldinc sbinc ssinc stnl section 4.2. Breg Creg Areg @ 1 Breg
158/205
(R)
8 Instruction Set Reference
timeslice
Code: FE
timeslice
Description: Deschedules the current process and starts the next task if a timeslice is due and timeslicing is enabled. This instruction is implemented in two stages to protect interrupt latency. The first stage deschedules the current process (even if the queue is empty), adds it to the back of the scheduling queue and sets the start_next_task status bit. The second stage clears the start_next_task status bit and starts the next process from the scheduling queue. An interrupt may occur between the two stages. Definition: if ( Statustimeslicing enabled and Statustimeslice_count=0 ) { if (timeslice trap installed) take timeslice trap else {
-- save state of current process
word[Tdesc @ pw.Iptr] word[Tdesc @ pw.Wptr] if (frontptr NotProcess) {
next instruction Wptr --queue not empty
-- add current process to scheduling queue
word[backptr @ pw.Link] Tdesc word[SchedulerQptr @ q.BPtrLoc] Tdesc
-- remove front process from queue
word[SchedulerQptr @ q.FPtrLoc] word[frontptr @ pw.Link]
-- start next process from scheduling queue
Tdesc Wptr Iptr }
frontptr word [frontptr @ pw.Wptr] word [frontptr @ pw.Iptr] set clear clear clear set clear clear clear clear
Statuslocal_interrupt_enable Statusoverflow Statusundeflo w Statuscarry Statususer_mode Statusinterrupt_mode Statustrap_mode Statussleep Statusstart_next_task
159/205
(R)
Statustimeslice_count Areg Breg Creg } } undefined undefined undefined
MaxTimesliceCount
where frontptr = word[SchedulerQptr @ q.FPtrLoc] backptr = word[SchedulerQptr @ q.BPtrLoc] Status Register: Global interrupt enable and timeslice enable preserved. Start_next_task is set when the current process is descheduled and cleared when the new process is started. An interrupt may occur between these two stages. Other bits are set to the values listed above. Comments: Secondary instruction. This instruction must not be used in an exception handler. See also: stop run Chapter 7.
160/205
(R)
8 Instruction Set Reference
umac
Code: 21 F3
unsigned multiply accumulate
Description: Multiply Areg by Breg, adding in Creg, to form a double word result, treating the initial values as unsigned. This can be used in forming the double length product of Areg and Breg, with Creg as carry in. Definition: Aregunsigned Bregunsigned Creg low_word (prod) high_word (prod)
Areg
where prod = (Bregunsigned x Aregunsigned) + Cregunsigned - the value of prod is calculated to unlimited precision Status Register: No effect Comments: Secondary instruction. See also: mac Chapter 5.
161/205
(R)
unsign
Code: 21 F9
unsign argument
Description: Remove the sign of the value in Areg and exclusive-or it with a sign flag in Breg. Definition: if (Areg 0) { Areg Breg } else { Areg Breg } Status Register: No effect Comments: Secondary instruction To aid the implementation of signed division. Breg must be 0 or 1. See also: divstep section 4.4.
Breg Areg
1 - Breg - Areg
162/205
(R)
8 Instruction Set Reference
wait
Code: 23 F6
wait
Description: Perform the first par t of a semaphore wait (or P) on the semaphore pointed to by Areg. If the semaphore count is non-zero then the count is decremented and the process continues; otherwise the current process is added to the back of the semaphore list ready for the current process to be descheduled. The instruction returns a boolean value in Areg stating whether a process deschedule is required. The instruction should be used in the following sequence: ld semptr; wait; cj continue; stop; continue: Definition: if (word[Areg @ s.Count] 0) { word[Areg @ s.Count] word[Areg @ s.Count] - 1 Areg false } else { if (frontptr = NotProcess) -- queue empty so add to queue { word[Areg @ s.Front ] Tdesc word[Areg @ s.Back ] Tdesc } else -- queue not empty so add to back of queue { word[backptr @ pw.Link] Tdesc word[Areg @ s.Back ] Tdesc } Areg true } Breg undefined Creg undefined where frontptr = word[Areg @ s.Front] backptr = word[Areg @ s.Back] Status Register:No effect Comments: Secondary instruction. Areg must be word-aligned. The semaphore structure should be in fast memory if interrupt latency is critical. See also: signal stop Chapter 7.
163/205
(R)
wsub
Code: F7
word subscript
Description: Generate the address of the element which is indexed by Breg in the word array pointed to by Areg. Definition: Areg Areg @ Breg Breg Creg Creg Areg
Status Register: No effect Comments: Secondary instruction. See also: ldlp ldnlp section 4.5.
164/205
(R)
8 Instruction Set Reference
xbword
Code: 22 F1
sign extend byte to word
Description: Sign-extend the value in the least significant byte of Areg into a signed word. Definition: Areg0..7 Areg8..BitsPerWord-1 Status Register: No effect Comments: Secondary instruction. See also: xsword section 4.4.6. Areg0..7 Areg7
165/205
(R)
xor
Code: 21 F0 Description: Bitwise exclusive or of Areg and Breg. Definition: Areg Breg Areg Breg Creg Creg Areg
exclusive or
Status Register: No effect Comments: Secondary instruction. See also: and not or section 4.8.
166/205
(R)
8 Instruction Set Reference
xsword
Code: 22 F2
sign extend sixteen to word
Description: Sign extend the value in the least significant 16 bits of Areg to a signed word. Definition: Areg0..15 Areg16..BitsPerWord-1 Status Register: No effect Comments: Secondary instruction. See also: xbword section 4.4.6. Areg0..15 Areg15
167/205
(R)
Appendices
168/205
(R)
A Constants and data structures
This appendix gives a full listing of all the constants and data structures used in this document, with brief descriptions and values.
169/205
(R)
A.1
Status Register
Full name Bit numbers 0-7 8 - 10 11 - 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 - 31 Meaning when set or meaning of value Multiply-accumulate number of steps. Multiply-accumulate data buffer size code. Multiply-accumulate scaling code. Multiply-accumulate accumulator format code. Disable external interrupts until explicitly enabled. Disable external interrupts until process descheduled. An arithmetic operation gave a positive overflow. An arithmetic operation gave a negative overflow. An arithmetic operation produced a carry. A user process is executing. An interrupt handler is executing. A trap handler is executing. The processor is due to go to sleep. Reserved. The CPU must start executing a new process. Timeslicing is enabled. Timeslice clock.
mac_count mac_buffer mac_scale mac_mode global_interrupt_disable local_interrupt_disable overflo w underflo w carry user_mode interrupt_mode trap_mode sleep reserved start_next_task timeslice_enable timeslice_count
Table A.1 Status register bits (see section 3.3.2)
A.2
Exception levels
Name User exception el_breakpoint_trap el_illegal_instr_trap el_idle_trap el_schedule_exception_trap el_run_trap el_stop_trap el_timeslice_trap Description User exception levels. Breakpoint trap. Illegal op-code trap. CPU idle trap. Schedule user process as exception trap. Run process trap. Stop process trap. Timeslice trap. 0 - 255 -1 -2 -3 -4 -5 -6 -7
Exception level
Table A.2 Exception levels (see Chapter 6)
170/205
(R)
A.3
Data structures
Name Word offset 7 6 5 4 3 2 1 0 Purpose Exception handler instruction pointer. Interrupted process status register. Interrupted process task descriptor. Interrupted process instruction pointer. Interrupted process stack pointer. Interrupted process Creg. Interrupted process Breg. Interrupted process Areg.
This section defines the data structures used by the ST20-C1 core.
ex.HandlerIptr ex.InterptdStatus ex.InterptdTdesc ex.InterptdIptr ex.InterptdWptr ex.InterptdCreg ex.InterptdBreg ex.InterptdAreg
Table A.3 Exception handler (see Chapter 6)
Name pw.Iptr pw.Wptr pw.Link Word offset 2 1 0 The instruction pointer. The instruction pointer. The link to the next process in the list. Purpose
Table A.4 Process descriptor block (see section 3.3.4)
Name q.BPtrLoc q.FPtrLoc Word offset 1 0 Purpose The last process in the waiting process list. The first process in the w aiting process list.
Table A.5 Waiting process list data structure (see section 7.1)
Name s.Back s.Front s.Count Word offset 2 1 0 Purpose The last process in the semaphore waiting list. The first process in the semaphore w aiting list. The unsigned number of extraprocesses that the semaphore will allow to continue running on a wait request.
Table A.6 Semaphore data structure (see section 7.9)
171/205
(R)
A.4
Miscellaneous constants
Name Value 32 4 #80000040 #1 0 255 1 -7 63 #80000000 #7FFFFFFF Meaning The number of bits in a word. The number of bytes in a word. The base of the exception descriptor area. Bit 0 of exception vector table entry for an exception. The boolean value `false'. Highest permitted exception level.
This section lists the constants which are used in this manual.
BitsPerWord BytesPerWord ExceptionBase ExceptionProcessType false HighestException LongMode LowestException MaxTimesliceCount MostNeg MostPos NotProcess ProductId SavedT askDescriptor SchedulerQptr ShortMode true UserProcessType
smacloop Q15 data format.
Lowest permitted exception level. Maximum value of timeslice count. The most negative integer value. The most positive signed integer value. Used, wherever a process descriptor is expected, to indicate that there is no process. A value used to identify the type and revision of processor, returned by the ldprodid instruction. Address where user process exception is saved when a schedule exception trap occurs. The address of the process list data structure for the scheduler list.
MostNeg #80000000
Depends on processor type. #800000008
MostNeg #80000000
0 1 #0
smacloop Q31 data format.
The boolean value `true'. Bit 0 of exception vector table entry for a user process.
Table A.7 Constants
172/205
(R)
173/205
(R)
B Instruction set summary
This appendix gives a full listing of all the instructions and instruction components, both in function code and alphabetical order. The `Memory' column gives, in hexadecimal, the bytes that appear in memory. For primary instructions, this includes a data value, which is represented by n.
B.1
Function code order
B.1.1 Primary instructions
Code Memory Mnemonic Name
0 1 2 3 4 5 6 7 8 9 A B C D E F
0n 1n 2n 3n 4n 5n 6n 7n 8n 9n An Bn Cn Dn En Fn
jn ldlp n pfix n ldnl n ldc n ldnlp n nfix n ldl n adc n fcall n cj n ajw n eqc n stl n stnl n opr n
jump load local pointer prefix load non-local load constant load non-local pointer negative prefix load local add constant function call conditional jump adjust work space equals constant store local store non-local operate
B.1.2 Secondary instructions
Code Memory Mnemonic Name
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF
rev dup rot arot add sub mul wsub not and or shl shr jab timeslice breakpoint
reverse duplicate rotate stack anti-rotate stack add subtract multiply word subscript bitwise not and or shift left shift right jump absolute timeslice breakpoint
174/205
(R)
10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
21 F0 21 F1 21 F2 21 F3 21 F4 21 F5 21 F6 21 F7 21 F8 21 F9 21 FA 21 FB 21 FC 21 FD 21 FE 21 FF 22 F0 22 F1 22 F2 22 F3 22 F4 22 F5 22 F6 22 F7 22 F8 22 F9 22 FA 22 FB 22 FC 22 FD 22 FE 22 FF 23 F0 23 F1 23 F2 23 F3 23 F4 23 F5 23 F6 23 F7 23 F8 23 F9 23 FA 23 FB 23 FC 23 FD 23 FE 23 FF
addc subc mac umac smul smacinit smacloop biquad divstep unsign saturate gt gtu order orderu ashr xor xbword xsword bitld bitst bitmask statusset statusclr statustst rmw lbinc sbinc lsinc lsxinc ssinc lwinc swinc ecall eret run stop signal wait enqueue dequeue ldtdesc ldpi gajw ldprodid io swap32 nop
add with carry subtract with carry multiply accumulate unsigned multiply accumulate short multiply initialize short multiply accumulate loop short multiply accumulate loop biquad IIR filter step divide step unsign argument saturate greater than greater than unsigned order unsigned order arithmetic shift right exclusive or sign extend byte to word sign extend sixteen to word load bit store bit create bit mask set bits in status register clear bits in status register test status register read modify write load byte and increment store byte and increment load sixteen and increment load sixteen sign extended and increment store sixteen and increment load word and increment store word and increment exception call exception return run process stop process signal wait enqueue a process dequeue a process load task descriptor load pointer to instruction general adjust workspace load product identity input/output byte swap 32 no operation
175/205
(R)
B.2
Alphabetical order
Code Memory Name
Mnemonic
adc n add addc ajw n and arot ashr biquad bitld bitmask bitst breakpoint cj n dequeue divstep dup ecall enqueue eqc n eret fcall n gajw gt gtu io jn jab lbinc ldc n ldl n ldlp n ldnl n ldnlp n ldpi ldprodid ldtdesc lsinc lsxinc lwinc mac mul nfix n nop not opr n or order orderu
primary 8 secondar y 04 secondar y 10 primary B secondar y 09 secondar y 03 secondar y 1F secondar y 17 secondar y 23 secondar y 25 secondar y 24 secondar y 0F primary A secondar y 38 secondar y 18 secondar y 01 secondar y 31 secondar y 37 primary C secondar y 32 primary 9 secondar y 3B secondar y 1B secondar y 1C secondar y 3D primary 0 secondar y 0D secondar y 2A primary 4 primary 7 primary 1 primary 3 primary 5 secondar y 3A secondar y 3C secondar y 39 secondar y 2C secondar y 2D secondar y 2F secondar y 12 secondar y 06 primary 4 secondar y 3F secondar y 08 primary F secondar y 0A secondar y 1D secondar y 1E
8n F4 21 F0 Bn F9 F3 21 FF 21 F7 22 F3 22 F5 22 F4 FF An 23 F8 21 F8 F1 23 F1 23 F7 Cn 23 F2 9n 23 FB 21 FB 21 FC 23 FD 0n FD 22 FA 4n 7n 1n 3n 5n 23 FA 23 FC 23 F9 22 FC 22 FD 22 FF 21 F2 F6 4n 23 FF F8 Fn FA 21 FD 21 FE
add constant add add with carry adjust work space and anti-rotate stack arithmetic shift right biquad IIR filter step load bit create bit mask store bit breakpoint conditional jump dequeue a process divide step duplicate exception call enqueue a process equals constant exception return function call general adjust workspace greater than greater than unsigned input/output jump jump absolute load byte and increment load constant load local load local pointer load non-local load non-local pointer load pointer to instruction load product identity load task descriptor load sixteen and increment load sixteen sign extended and increment load word and increment multiply accumulate multiply negative prefix no operation bitwise not operate or order unsigned order
176/205
(R)
pfix n rev rmw rot run saturate sbinc shl shr signal smacinit smacloop smul ssinc statusclr statusset statustst stl n stnl n stop sub subc swap32 swinc timeslice umac unsign wait wsub xbword xor xsword
primary 2 secondary secondary secondary secondary secondary secondary secondary secondary secondary secondary secondary secondary secondary secondary secondary secondary primary D primary E secondary secondary secondary secondary secondary secondary secondary secondary secondary secondary secondary secondary secondary
00 29 02 33 1A 2B 0B 0C 35 15 16 14 2E 27 26 28 34 05 11 3E 30 0E 13 19 36 07 21 20 22
2n F0 22 F9 F2 23 F3 21 FA 22 FB FB FC 23 F5 21 F5 21 F6 21 F4 22 FE 22 F7 22 F6 22 F8 Dn En 23 F4 F5 21 F1 23 FE 23 F0 FE 21 F3 21 F9 23 F6 F7 22 F1 22 F0 22 F2
prefix reverse read modify write rotate stack run process saturate store byte and increment shift left shift right signal initialize short multiply accumulate loop short multiply accumulate loop short multiply store sixteen and increment clear bits in status register set bits in status register test status register store local store non-local stop process subtract subtract with carry byte swap 32 store word and increment timeslice unsigned multiply accumulate unsign argument wait word subscript sign extend byte to word exclusive or sign extend sixteen to word
177/205
(R)
C Compiling for the ST20-C1
C.1 Generating prefix sequences
Prefixing is intended to be performed by a compiler or assembler. Prefixing b y hand is not advised. Normally a value can be loaded into the instruction data value by a variety of different prefix sequences . It is important to use the shortest possible sequence as this enhances both code compaction and execution speed. The best method of optimizing object code so as to minimize the number of prefix instructions needed is shown below. C.1.1 Prefixing a constant The algorithm to generate a constant instruction data value e for a function op is described by the following recursive function.
prefix( op , e ) = if (e < 16 AND e 0) op( e ) else if (e 16) {prefix( pfix, e >> 4 ); op( e # F )} else if (e < 0) {prefix( nfix, (~e) >> 4 ); op( e # F )}
where ( op, e ) is the instruction component with function code op and data field e , ~ is a bitwise NOT, and >> is a logical right shift. C.1.2 Evaluating minimal symbol offsets Several primary instructions have an operand that is an offset between the current value of the instruction pointer and some other part of the code. Generating the optimal prefix sequence to create the instruction data value for one of these instructions is more complicated. This is because two, or more, instructions with offset operands can interlock so that the minimal prefix sequences for each instruction is dependent on the prefixing sequences used for the others. For example consider the interlocking jumps below which can be prefixed in two distinct ways. The instructions j and cj are respectively jump and conditional jump. These are explained in more detail later. The sequence:
cj +16; j -257
can be coded as
pfix 1; cj 0; pfix 1; nfix 0; j 15
but this can be optimized to be
cj 15; nfix 15; j 1
which is the encoding for the sequence
cj +15; j -255
178/205
(R)
This is because when the two offsets are reduced, their prefixing sequences take 1 byte less so that the two interlocking jumps will still transfer control to the same instructions as before. This compaction of non-optimal prefix sequences is difficult to perform and a better method is to slowly build up the prefix sequences so that the optimal solution is achieved. The following algorithm performs this. 5 6 7 8 Associate with each jump instruction or offset load an `estimate' of the number of bytes required to code it and initially set them all to 0. Evaluate all jump and load offsets under the current assumptions of the size of prefix sequences to the jumps and offset loads For each jump or load offset set the number of bytes needed to the number in the shortest sequence that will build up the current offset. If any change was made to the number of bytes required then go back to 2 otherwise the code has reached a stable state.
The stable state that is achieved will be the optimal state. Where the code being analyzed has alignment directives, then it is possible that this algorithm will not reach a stable state. One solution to this, is to allow the algorithm to increase the instruction size but not allow it to reduce the size. This is achieved by modifying stage 7 to choose the larger of: the currently calculated length, and the previously calculated length. This approach does not always lead to minimal sized code, but it guarantees termination of the algorithm. Steps 2 and 3 can be combined so that the number of bytes required by each jump is updated as the offset is calculated. This does mean that if an estimate is increased then some previously calculated offsets may have been invalidated, but step 4 forces another loop to be performed when those offsets can be corrected. By initially setting the estimated size of offsets to zero, all jumps whose destination is the next instruction are optimized out. Knowledge of the structure of code generated by the compiler allows this process to be performed on individual blocks of code rather than on the whole program. For example it is often possible to optimize the prefixing in the code for the sub-components of a programming language construct before the code for the construct is optimized. When optimizing the construct it is known that the sub-components are already optimal so they can each be considered as a fixed block of code which cannot be reduced. This algorithm may not be efficient for long sections of code whose underlying structure is not known. If no knowledge of the structure is available (e.g. in an assembler), all the code must be processed at once. In this case a code shrinking algorithm where in step one the initial number of bytes is set to twice the number of bytes per word is used. The prefix sequences then shr ink on each iteration of the loop. 1 or 2 iterations produce fairly good code although this method will not always produce optimal code as it will not correctly prefix the pathological example given above.
179/205
(R)
C.2
Compiling switch statements
The switch statement is a special form of conditional transfer where the transfer is determined by comparing an expression to a number of constants. When compiling the process switch x ... the expression x is evaluated and stored in a local variable by
x; stl selector
Then each branch of the switch statement
c1, ..., cn P
can be compiled by
L: M:
ldl selector; ldc c1; sub; cj L; ldl selector; ldc c2; sub; cj L; ... ldl selector; eqc cn; cj M; P; j END;
where the label END: is placed at the end of the switch statement. C.2.3 Optimal compilation of switch The compilation method given above will produce inefficient code for large switch statements. To produce more efficient code the following rules can be used. First build up a set of pairs of selector values and processes, consisting of every selector value in the switch statement along with its associated process -- the process part of each pair can be represented by the offset to the start of the compiled code for that process. Then the following rules can be used. 1 2 3 If there are 3 entries or less then use the method as described above. If there are 12 entries or less then use a binary search to limit the number of comparisons required. For more than 12 entries attempt to use a jump table. The offset of the start of each selected process is placed in the table against each selector value. Entries that do not match a selector in the switch statement must contain the offset of an error handler process. This jump table should be the largest table such that about one third of the entries are filled. This compilation strategy is then recursively called to handle the two ends. The jab (section 4.10) and ldpi (section 4.5) instructions, can be used to jump to the selected piece of code.
The choice of 3 or less entries, 12 or less entries and one third filled tab le are the values used in current ST20 compilers.
180/205
(R)
Consider compiling the following switch expression: switch X c1 P1 ... cn Pn where, for brevity, it is assumed that all the switch selectors are already in increasing order. Three entries or less This switch is compiled as: if (X = c1) P1 else if (X = c2) P2 ... else if (X = cn) Pn Four to twelve entries This switch is compiled as: if (X <= cn/2 ) { if (X <= cn/4 ) ...etc. else ... etc. } else ... etc. Using a jump table Assume that ci ... cm form a third filled jump table. Then the switch is compiled as: if ( X < ci ) { switch X c1 P1 ... ci-1 Pi-1
181/205
(R)
} else if (X > cm) ... similar else ... jump table code where jump table code is
X; ldc ci; sub; ldc jump_size; mul; ldc (jump_table-M); ldpi M: add; jab; jump_table: j case_0; j case_1; ... ; j case_k ERROR: ... error code ... code for Pi Li: ... Lm: ... code for Pm
The code at jump_table consists of a sequence of jump instructions which transfer control to the relevant branch Li ... Lm or to ERROR. The destination, case_x, of each of these jumps is Lj if cj is equal to (ci + x) and is ERROR otherwise. The code at ERROR should be the same code as used at the end of an IF statement where all the conditionals have been false. The add, ldpi and jab instructions are explained in other sections. All the jumps in the jump_table code must be encoded to the same length (jump_size bytes) to enable them to be accessed as a byte array. nop, which is a two byte instruction that performs no operation, can be used to ensure this where different operands require a different amount of prefixing. Also note that in the special case where jump_size is 1, `ldc jump_size; mul' can be removed from the sequence, and where jump_size is 4, `ldc jump_size; mul' can be removed provided add is replaced with wsub.
C.3
Expression evaluation order
Let depth(e) be the number of stack locations needed for the evaluation of expression e, defined b y
depth(constant) depth(variable) depth(function call) depth(e1 op e2)
= = = =
1 1 `infinite' if (depth(e1) > depth(e2)) depth(e1) else if (depth(e1) < depth(e2)) depth(e2) else depth(e1) + 1
That is, if the depth required for each expression is the same, then one extra stack location is required to store the result of the first expression, while the second expression is being evaluated. If the stack requirements for each expression are different,
182/205
(R)
then the total stack requirement is the larger stack requirement of the individual expressions. This is only the case if care is taken over the order of evaluation. Note that `infinite' should be taken as meaning greater than any finite depth - because a function call does not preserve values on the stack. Let the function eval( e, r ) evaluate expression e where there are r registers available to perform the evaluation. Where this expression is an operation on two sub expressions - e1 op e2 - it is efficiently evaluated by the following algorithm, where op commutes is true if a op b is equal to b op a and false otherwise:
if (depth(e1) < r AND depth(e2) < r) /* i.e. depth of both expressions is less than the number of registers available */ { if (depth(e2) > depth(e1)) { if (op commutes) { eval( e2, r ); eval( e1, r-1 ); op } else { eval( e2, r ); eval( e1, r-1 ); rev; op (*) } } else if (depth(e2) <= depth(e1)) { eval( e1, r ); eval( e2, r-1 ); op } } else if (depth(e1) >= r OR depth(e2) >= r) { if (depth(e2) >= depth(e1)) { if (depth(e1) >= r ){ /* i.e. both depths >= r */ eval( e2, r ); stl temp; eval( e1, r ); ldl temp; op } else { /* i.e. depth(e1) < r AND depth(e2) >=r */ if (op commutes) { eval( e2, r ); eval( e1, r-1 ); op } else { /* i.e. operation doesn't commute */ eval( e2, r ); eval( e1, r-1 ); rev; op } } } else if (depth(e2) < depth(e1)) { if (depth(e2) >= r) { /* i.e. both depths >= r */ if (op commutes){ eval( e1, r ); stl temp; eval( e2, r ); ldl temp; op } else { eval( e2, r ); stl temp; eval( e1, r ); ldl temp; op } } else { /* i.e. (depth(e2) < r) AND depth(e1) >= r */ eval( e1, r ); eval( e2, r-1 ); op
183/205
(R)
} } }
The justification of this is as follows. If the depth of both expressions is less than the number of registers available (r), then there is no need to store the result of the first evaluation in a temporary variable. The deeper expression is evaluated first to ensure that the operation evaluates in the least number of stack registers. If this were not done then the total depth requirement would have to be incremented due to the extra location for storing the result of the first e valuation. If both expression depths are as great as r, then a local variable (temp) must be used to store the result of the first expression. It is again better to evaluate the expression with the larger depth if possible, because this minimizes the number of local variables required. If only one of the expressions is as great as r, then provided that expression is evaluated first, there is no need to store its result in a local variable. In the cases where a temporary variable temp is required to hold the value of the first expression in the evaluation of e1 op e2, then that variable can be used as a temporary variable in the evaluation of the first e xpression. Also a temporary variable used in the evaluation of the first expression and not used to hold its result can be used in the evaluation of second expression. The code sequence
(e2; e1; rev; op)
at (*) in the above algorithm, can be optimized further to
(e1; e2; op)
removing the execution of the rev instruction. But be aware that the latter uses an extra stack register, and so there is trade-off here between evaluation depth and code size.
C.4
Loading sequence
The three registers of the evaluation stack are used to hold operands of instructions. Evaluation of an operand or parameter may involve the use of more than one register. Care is needed when evaluating such operands to ensure that the first oper and to be loaded is not pushed off the bottom of the evaluation stack by the evaluation of later operands. The processor does not detect evaluation stack overflow. Three registers are available for loading the first operand, two registers for the second and one for the third. Consequently, the instructions are designed so that Creg holds the operand which, on average, is the most complex, and Areg the operand which is the least complex. In some cases, it is necessar y to evaluate the Areg and Breg operands in advance, and to store the results in temporary variables. This can sometimes be avoided using the reverse instruction. Any of the following sequences may be used to load the operands A, B and C into Areg, Breg and Creg respectively.
184/205
(R)
1 2 3 4
C; B; A C; A; B; rev B; C; rev; A A; C; rev; B; rev
The choice of loading sequence, and of which operands should be evaluated in advance is determined by the number of registers required to evaluate each of the operands. In particular, if C requires more than two registers it must be loaded before A and B. If A or B requires more than two registers it must be evaluated before C and may need to be stored in a temporary variable if C requires more than two registers.
registers required C 2 B A temp b a load sequence instructions
1 1 1 2 2 2 >2 >2 >2
1 2 >2 1 2 >2 1 2 >2 1 2 >2 1 2 >2 1 2 >2 * * * * * * * * * * *
1 2 4 1 1 1 3 3 3 1 2 1 1 1 1 1 2 1
C; B; A C; A; B; rev A; C; rev; B; rev C; B; A A; stl a; C; B; ldl a A; stl a; C; B; ldl a B; C; rev; A A; stl a; B; C; rev; ldl a A; stl a; B; C; rev; ldl a C; B; A C; A; B; rev A; stl a; C; B; ldl a C; B; A A; stl a; C; B; ldl a A; stl a; C; B; ldl a B; stl b; C; ldl b; A B; stl b; C; A; ldl b; rev A; stl a; B; stl b; C; ldl b; ldl a
>2
1 1 1 2 2 2 >2 >2 >2
Table 8.1 Register loading sequences Table 8.1 gives the instruction sequences needed for loading three operands into the evaluation stack, where the number in the `load sequence' column refers to the list above. The columns labelled `temp' indicate where a local variable is needed to temporarily save an operand value in the load sequence. The variable `a' is used to store operand `A' and the variable `b' is used to store operand `B'.
185/205
(R)
186/205
(R)
D Glossary
Active process An active process in a multi-tasking program is one that is either executing or waiting for CPU time. It may have been interrupted or trapped or it may be waiting on a scheduling queue. Address A 32-bit integer used to identify one byte of memory or memory-mapped device. ST20 addresses are signed, i.e. the address range is the same as the range of a signed 32-bit integer. A pointer. Address space The range of valid addresses. Areg The 32-bit register at the top of the evaluation stack. Arithmetic shift A shift in which the sign bit is preserved. An arithmetic right shift copies the sign bit into the vacated bits. Array A data structure in which all the elements are of the same type and are identified by a non-negative integer subscript. A vector. Bit A binary digit, which can take the value 1 (also called set or true) or 0 (also called clear or false). Bitwise operation An operation on multi-bit values which treats each bit as a true (if set) or false (if clear). Boolean Logic values and calculations using AND, OR, and NOT. A multi-bit value is treated as Boolean if it represents a single true of false value, as opposed to bitwise. Breg The 32-bit register in the middle of the evaluation stack. Byte An 8-bit value or location. Cache Fast memory used to temporarily hold copies of slow memory locations in order to provide high performance memory access with slow memory. Channel A synchronous, one-way, point-to-point, unbuffered communications mechanism.
187/205
(R)
Clear A bit is clear if it has the value 0. Creg The 32-bit register at the bottom of the evaluation stack. Current process The process being either being executed by the CPU, interrupted or trapped. DCU Diagnostic Controller Unit. Descriptor Identifier of a process or exception. Deschedule A process is descheduled when it is not being executed by the CPU and has not been interrupted or trapped by an exception handler but it is capable of being restarted. To deschedule a process is to make it descheduled. Diagnostic Controller Unit ST20 hardware macro-cell which provides fully non-intrusive breakpoints, watchpoints and code tracing. DMA Direct memory access. A peripheral or external device performs DMA when it reads directly from or writes directly to memory. DMA peripheral A peripheral with a DMA engine. Double word A 64-bit value, occupying two words. Evaluation stack The Areg, Breg and Creg, which together behave as a stack. The evaluation stack is used for expression evaluation and to hold operands for instructions. The registers cannot be individually addressed, but values can be pushed onto the stack or popped off it. Exception A mechanism for handling events outside the normal program flow. An exception may be an interrupt or a trap. Exception handler The code and data which defines the action when an exception is taken. An exception handler may be a trap handler or an interrupt handler. Exception level The signed integer specifying which exception will be taken, which is the word offset from the base of the exception vector table to the descriptor of the process or exception.
188/205
(R)
Exception vector table The table mapping exception levels to user processes and exception descriptors. Executing A process is executing if the CPU is currently modifying its state. Only one process can be executing at any one time. External interrupt An interrupt generated by a peripheral or external device, routed through the interrupt controller and handled by an interrupt handler. Function A named section of code which may have parameters and may return one or more values. Function A primary instruction. Function code The most significant 4-bits of an instruction component, which defines the action of the component. Half-word A 16-bit value or location. Half-word aligned An address is half-word aligned if it is divisible by two, i.e. if bit 0 is 0. Idle The CPU is idle when there is no process or exception handler executing. Inactive A process is inactive if it is capable of being restarted but it is currently waiting for some event to occur before it can continue. The event might be a semaphore signal or a peripheral completing a DMA. Instruction A code element corresponding to a single instruction set mnemonic, consisting of zero or more prefixes followed by a non-prefix instruction component. Instruction component A single byte of code, consisting of a function code and 4 bits of data. Instruction data value The operand of a primary instruction, which is built from the least significant 4 bits of each of the instruction components of the instruction. Instruction pointer The 32-bit CPU register which holds the address of the next instruction to be executed by the current process. Integer A whole number, which may be positive, negative or zero.
189/205
(R)
Internal memory On-chip memory. Interrupt A signal from outside the CPU to switch from normal execution to execution of an interrupt handler. If the interrupt causes a scheduling event, then that may cause an immediate trap. Interrupt controller An on-chip peripheral which arbitrates between interrupt requests and signals to the CPU when an interrupt is required. Interrupt handler The code and data which defines the action when an interrupt is taken. Interrupt latency The time taken from receiving an interrupt signal to starting execution of the interrupt handler. Interrupted A process is in an interrupted state if it is waiting for an interrupt handler to complete and it will continue as soon as the interrupt handler has completed. IOreg The input and output register, whose bits are directly connected to peripherals. Iptr The instruction pointer. Linked list A linked list is an ordered data structure where each element includes a pointer to the next element. A list is identified b y a front pointer, which points to the first element in the list. Linked lists are also called queues. Little-endian Little-endian ordering of data, used by all ST20s, means that less significant data is always held in lower addresses. This applies to bits in bytes, bytes in words and words in memory. List See linked list. Local A local variable is one that is addressed by means of an offset from the workspace pointer. Local variables are usually held on a stack and are defined within a function or procedure. cf. non-local. Logical shift A shift in which the sign bit is shifted by the same amount as any other bit.
190/205
(R)
Memory Addressed devices which can be read and possibly written. Reading read/write memory at a particular address gives the value that was last written to that address. Reading read-only memory always gives the same value. Memory-mapped Addressed devices which are read and written as if they were memory, but do not necessarily return the last written value. Microcode The on-chip hardware mechanism for implementing instructions. Micro-kernel The ST20 on-chip hardware support for multi-tasking. MostNeg The most negative 32-bit integer, #80000000. The address of the base of memory. The value of the null process pointer NotProcess. Multi-tasking Running several communicating processes on the same processor using time sharing. Non-local A non-local variable is one that is addressed by means of an offset from the base address held in Areg. Non-local variables are usually not held on a stack and are defined globally. cf. local. NotProcess The null process pointer, which has the value MostNeg. On-chip emulator Diagnostic controller unit. On-chip memory Very fast memory included in the ST20 chip. On-chip memory is usually located at the bottom of the address space. Operate An instruction component, with function code #F and mnemonic opr, used to encode secondary instructions. The instruction data value identifies the secondar y instruction. Operating system Code controlling multi-tasking, interrupts and certain system operations, which may be called by a program. The ST20 supports many of these facilities in microcode without an operating system. Operation A secondary instruction. Peripheral A device connected to and controlled by the CPU.
191/205
(R)
Pointer An address. Prefix An instruction component used to extend the data value of the following instruction component. There are two prefixes - pfix (function code #2) which preserves the sign and nfix (function code #6) which complements the data value and so changes the sign. Primary instruction One of thirteen directly coded instructions which therefore do not require an operate instruction component. A primary instruction has a data part associated with it, called the operand. cf. secondary instruction. Priority A value associated with each process or interrupt which is used to make choices when conflicts occur . Procedure A named section of code which may have parameters. Process A sequential algorithm and data which can run in parallel with other processes.Processes are also known as tasks or threads. Each process is identified b y a task descriptor which points to a process descriptor block. Process descriptor block A data structure defining a process, including the saved instruction pointer, workspace pointer and a link used for queueing processes. Process pointer A pointer to a process. Queue A linked list. Real time operating system An operating system providing facilities for real-time programming, including multitasking and interrupt handling. Register cache Registers used to cache the lowest words of the work space in order to accelerate access to local variables. Also called work space cache or register file. Register file See register cache. RTOS Real-time operating system. Scheduler Code or hardware which controls how processes share CPU time.
192/205
(R)
Scheduling kernel Code to override the ST20 built-in microcode scheduler, consisting mainly of trap handlers which are taken when a scheduling event occurs. Scheduling queue The queue of processes waiting for CPU time. Secondary instruction An instruction encoded using an operate instruction component. A secondary instruction has no data part associated with it, as the instruction data value is used to identify the instruction. cf. primary instruction. Semaphore A mechanism for synchronizing and signalling between processes. Set A bit is set when its value is 1. Sign extension A signed value can be extended to occupy more bits by filling the most significant bits with copies of the sign bit. Sign bit The most significant bit of a signed value, used to indicate the sign of the value. Signed value Value which could be positive or negative depending on the sign bit. Sleep The CPU is asleep when its clocks are turned off to reduce power consumption. The CPU automatically goes to sleep after it becomes idle. It is woken by an external interrupt. Slot A slot is a field of a data structure for holding a value. A slot normally occupies one word. Stack A stack is a data storage structure that uses a `first in, last out' access model. Adding an item to a stack is called pushing on, and extracting a value is called popping. A program stack is normally implemented using the Wptr as the stack pointer. The Areg, Breg and Creg form the evaluation stack. Stack, evaluation See evaluation stack. State The state of a process is the data needed for the process to continue execution. This may include the evaluation stack, Iptr, Wptr, status register and local data. Status register The status register holds CPU status information which must be saved if an interrupt occurs but is not saved if the process deschedules. Some status register
193/205
(R)
bits are global and are preserved when a process is descheduled and others are local to a process and are lost when the process is descheduled. Task A process. Task descriptor The identifier of a process, which is a pointer to the process descriptor block. When a process is executing, the Tdesc holds the task descriptor of the process. Tdesc A 32-bit register which holds the task descriptor of the current process. Thread A dynamically generated process. Timeslice Deschedule a process to allow other processes access to the CPU. Timeslicing prevents any one process from occupying too much CPU time. A timesliced process is normally added to the back of the scheduling queue and the process on the front of the queue is restarted. Trap An exception caused by software, or an interrupt initiating a scheduling event. Trap handler The code and data which define what happens when a trap is taken. Trapped A process is trapped when it is waiting for a trap handler to complete. Unsigned value Value which could only be positive, so the most significant bit is not used as a sign bit. Vector An array. Waiting A process is waiting for an event if it cannot continue execution until that event has occurred. The event could be for example a communication or a semaphore signal. Word A word is a four byte (32-bit) location for data. Word-aligned An address is word-aligned if it is divisible by 4, i.e. if the least significant two bits are zero. Work space Space in memory for local variables and values.
194/205
(R)
Work space cache Registers used to cache the lowest words of the work space in order to accelerate access to local variables. Also called register cache or register file Workspace pointer A 32-bit register pointing to the work space for the current process. Wptr The workspace pointer.
195/205
(R)
196/205
(R)
Index
Index
Symbols
+ (plus), 11 .. (ellipsis), 9 / (divide), 11 < (less than), 11 = (equals), 11 > (greater than), 11 >> (logical shift right), 11 @ (word offset), 10 { } (braces), 13 (prime), 10 (less than or equal to), 11 (greater than or equal to), 11 (is not equal to), 11 add, 89 boolean, 45-47 divide, 102 modulo, 11 multiply accumulate, 56-67 saturated, 138 shift, 187 shift right, 94 subtract, 155 subtract with carry, 156 unchecked, 11 unsign, 162 arot, 30, 93 arrays, 41, 187 word subscript, 164 ashr, 48, 94 assignment byte, 35 single word, 35
Numerics
16-bit, 32 assignment, 35 load, 31, 124 signed, 31, 125 multiply, 146 multiply accumulate, 56, 143, 144 sign extension, 40, 167 store, 31, 147
B
biquad, 56, 58, 95 bit, 187 load, 47, 96 mask, 47 read modify write, 47, 135 store, 47, 98 bitld, 47, 96 bitmask, 47, 52, 97 BitsPerWord, 172 bitst, 47, 98 bitwise, 187 and, 11 exclusive or, 166 not, 11, 130 operations, 47 or, 11, 131 xor, 11 block copy, 31, 33 load, 126 store, 158 boolean, 187 expressions, 45-47 branch, 43, 100 breakpoint, 71, 99 trap, 71 Breg, 20, 21, 30, 187 buffer size multiply accumulate, 57 byte, 32, 187 addressing, 19
A
active, 78, 187 adc, 36, 88 add, 36, 89 constant, 36, 88 with carry, 38, 90 addc, 38, 90 address, 187 calculation, 10, 41-43 load local, 41 load non-local, 41 space, 18, 187 subscript, 41 addressing, 16 adjust work space, 91 ajw, 48, 91 alignment, 18 and bitwise, 11, 47 boolean, 11, 46 instruction, 92 anti-rotate stack, 30, 93 architecture, 16-28 Areg, 20, 21, 30, 187 arithmetic, 35-40
197/205
(R)
Index
arrays, 43 assignment, 35 load, 31 load and increment, 115 ordering compatibility, 17 selector, 18 sign extension, 40, 165 store, 31, 139 swap, 48, 157 BytesPerWord, 172 process queue, 80 conversion object length, 17, 40 type, 12 create bit mask, 97 Creg, 20, 21, 30, 188 current process, 188
D
data format multiply accumulate, 58, 65 data structures, 171 data types, 9 DCU, 188 decrement, 36, 88 definition in instruction descriptions, 7, 8 depth of stack needed, 182 dequeue a process, 79, 101 descheduled process, 79, 188 state, 80, 82 description in instruction descriptions, 7 descriptor, 188 in exception vector table, 72 of process, 79 task, 24 diagnostic controller unit, 188 direct memory access, 53 disabling exceptions, 77 timeslicing, 81 division, 36, 102 divstep, 36, 102 DMA, 188 peripherals, 53 double word, 19, 188 arithmetic, 38 multiply, 39 DSP, 56-67 dup, 30, 103 duplicate top of stack, 30, 103 dynamic allocation of workspace, 52
C
cache, 187 call exception, 104 function, 48-52, 108 procedure, 48-52 carry, 22, 38 channel, 187 channel-type peripherals, 53 checked arithmetic, 36 cj, 43, 100 clear, 188 status bit, 55 status bits, 148 clock timeslicing, 81 code in instruction descriptions, 6 coding of instruction, 6, 24 comments in instruction descriptions, 8, 9 comparison, 43-45 equals constant, 43, 106 greater than, 43, 110 unsigned, 43, 111 order, 132 unsigned, 133 compatibility byte ordering, 17 complement, 47 conditional jump, 43, 100 conditionals, 43-45 conditions in instruction descriptions, 13 constants, 172 equals, 43 loading, 31, 116 from table, 34 machine constant definitions, 14, 169 used in instruction descriptions, 14, 172 control block exception, 73
E
ecall, 71, 104 ellipsis, 9 else, 13 emulator, 191 enabling exceptions, 77 timeslicing, 81 encoding of instructions, 6, 24
198/205
(R)
Index
enqueue a process, 79, 105 eqc, 43, 106 equals constant, 43, 106 eret, 71, 107 error signals in instruction descriptions, 8 evaluation arithmetic, 35-40 boolean, 45-47 function, 51 of addresses, 41-43 of expressions, 33-35, 182-185 stack, 19, 21, 30, 33-35, 182-185, 188 loading sequences, 34 subscripts, 42 exception, 70-77, 188 call, 104 control block, 73 handler, 188 initial state, 74 levels, 71, 170, 188 return, 107 type, 72 vector table, 72, 189 ExceptionBase, 72, 172 ExceptionProcessType, 73, 172 exclusive or, 166 bitwise, 47 executing, 78, 189 expression depth, 182-184 evaluation, 33-35, 182-185 using functions, 51 extend sign, 40, 165 external interrupt, 189 greater than, 43, 110 unsigned, 43, 111 gt, 43, 110 gtu, 43, 111
H
half-word, 16, 32, 189 aligned, 19, 189 assignment, 35 load, 31, 124 signed, 31, 125 multiply, 36, 146 multiply accumulate, 56, 143, 144 sign extension, 40, 167 store, 31, 147 high_word, 12 HighestException, 172
I
i/o, 52-55 identity of device, 14, 122 idle, 85, 189 trap, 72 if compiling, 43-45 in definitions, 13 IIR filter, 58, 95 inactive, 78, 82, 189 process restarting, 82 increment, 36, 88 indirection of exceptions, 72 initializing exception handlers, 76 multi-tasking, 83 smacloop, 143 input/output, 52-55, 112 instruction, 14, 189 address, 41 component, 25, 189 data value, 25, 189 definitions, 87 encoding, 6, 24 pointer, 20, 189 load, 41, 121 int16, 12 integer, 189 length conversion, 40 internal memory, 190 interrupt, 70-77, 190 controller, 190 enable, 22 handler, 190
F
false, 16, 43, 172 fcall, 48, 108 filter biquad IIR, 58 IIR step, 95 function, 189 call, 48-52, 108 code, 7, 25, 189 evaluation, 51 functions in instruction descriptions, 12
G
gajw, 48, 109 general adjust workspace, 109 global_interrupt_enable, 22, 77
199/205
(R)
Index
latency, 190 interrupt_mode, 22, 23 interrupted, 78, 190 io, 52, 112 IOreg, 20, 24, 52, 190 Iptr, 20, 190 load, 41 local, 19, 23, 31, 190 load pointer, 118 load variable, 31, 117 store variable, 31, 151 local_interrupt_enable, 22, 77 logic, 45-47 bitwise instructions, 47 logical shift, 190 long arithmetic, 38 multiplication, 127 subtract, 156 LongMode, 58, 172 low_word, 12 LowestException, 172 lsinc, 31, 124 lsxinc, 31, 125 lwinc, 31, 126
J
j, 43, 113 jab, 43, 48, 114 jump, 43-45, 112, 113 absolute, 43, 114 conditional, 43, 100 table, 181
L
lbinc, 31, 115 ldc, 31, 116 ldl, 31, 117 ldlp, 41, 118 ldnl, 31, 119 ldnlp, 41, 120 ldpi, 41, 121 ldprodid, 14, 122 ldtdesc, 52, 79, 123 length conversion, 17 link in queue, 80 linked list, 80, 190 list, 190 little-endian, 16, 19, 157, 190 load, 31-33 bit, 47, 96 byte, 31, 115 constant, 31, 116 instruction pointer, 41 local, 31, 117 local pointer, 41, 118 non-local, 31, 119 non-local pointer, 41, 120 operands, 34 parameters, 50 pointer to instruction, 121 product identity, 122 sequence integer stack, 34 sequences, 184 sixteen bit, 31, 124 signed, 125 sixteen bit signed, 125 task descriptor, 123 word, 31, 126
M
mac, 56, 127 mac_buffer, 22, 57 mac_count, 22, 57 mac_mode, 22, 57 mac_scale, 22, 57 mask, 47 create, 97 maximum, 45, 132 unsigned, 133 MaxTimesliceCount, 81, 172 memory, 18, 191 block copy, 33 mapped, 191 mapped peripherals, 53 read modify write, 47, 135 representation of, 10 microcode, 191 micro-kernel, 191 minimum, 45, 132 unsigned, 133 minus, 36, 38 modulo arithmetic, 11 MostNeg, 16, 172, 191 MostPos, 16, 172 move, 33 mul, 36, 128 multiply, 36, 128 half-word, 36 multiple length, 39 multiply accumulate, 56-67, 127 short loop, 144 initialize, 143 unsigned, 38, 161 multi-tasking, 78-85, 191
200/205
(R)
Index
N
negation, 38 next instruction, 9 nfix, 25 no operation, 129 non-local, 31, 191 load pointer, 120 load variable, 31, 119 store variable, 152 nop, 129 not bitwise, 47 boolean, 11, 46 instruction, 130 notation, 6-14 NotProcess, 18, 172, 191
P
parallel programming, 78-85 parameters procedure, 50 peripherals, 52-55, 191 pfix, 25 plus, 36 pointer, 192 load local, 41 load non-local, 41 reserved values, 18 pop, 21 prefixing, 25, 178, 192 primary instructions, 8, 25, 26, 192 prime notation in instruction descriptions, 10 priority, 78, 170, 192 procedure, 192 call, 48-52 process, 78-85, 192 creation, 83 current, 188 dequeue, 101 descriptor, 24, 170 descriptor block, 79, 192 enqueue, 105 pointer, 192 queues, 80 run, 137 state, 9, 24, 79 in descriptions, 7 states, 78 stop, 153 timeslice, 159 transitions, 78 user, 22 waiting, 80 product identity, 14, 122 ProductId, 172 push, 21 pw.Iptr, 82 pw.Wptr, 82
O
object length conversion, 40 objects, 9 On-chip emulator, 191 on-chip memory, 191 operands in descriptions, 6 primary instructions, 26 secondary instructions, 34 operate, 25, 191 operating system, 191 operation, 191 code, 6, 25, 27 operations bitwise, 47 operators in instruction definitions, 11 in instruction descriptions, 11 opr, 8, 27 or bitwise, 11, 47 boolean, 11, 46 instruction, 131 order, 43 of loads, 184 unsigned, 43 order, 132 ordering of data, 16 values, 43-45 orderu, 133 output, 52-55, 112 overflow, 22, 36, 37 saturate, 138
Q
q.BptrLoc, 80, 171 q.FPtrLoc, 80, 171 Q15, 58 Q31, 58, 65 queue, 192 control block, 80 scheduling, 80
201/205
(R)
Index
R
read modify write, 47, 135 real time operating system, 192 register, 20 cache, 20, 192 file, 192 in instruction descriptions, 7 memory mapped, 53 rem, 11 remainder, 36 repeat loop, 45 representing memory in instruction descriptions, 10 rescheduling a process, 82 reserved memory, 18 return exception, 107 from procedure, 49 rev, 30, 134 reverse stack, 30, 134 rmw, 47, 135 rot, 30, 136 rotate stack, 30, 136 RTOS, 192 run, 79, 82, 84, 137 trap, 72
S
s.Back, 79, 84, 171 s.Count, 84, 171 s.Front, 84, 171 saturate, 37, 138 SavedTaskDescriptor, 172 sbinc, 31, 139 scaling multiply accumulate, 59 scheduler, 192 SchedulerQptr, 80, 172 scheduling, 78-85 kernel, 193 kernels, 84 queue, 80, 193 secondary instructions, 8, 25, 27, 193 semaphore, 84-85, 193 signal (V), 84, 85, 142 wait (P), 85, 163 set, 193 status bits, 55, 149 setting up exception handler, 76 multi-tasking, 83 process, 83 semaphore, 84 shift, 48
instructions, 47 logical left, 140 right arithmetic, 94 right logical, 141 shl, 48, 140 short multiply, 36, 146 multiply accumulate loop, 144 store, 147 ShortMode, 58, 172 shr, 48, 141 sign bit, 193 extension, 17, 40, 193 byte, 165 half-word, 31, 167 signal, 84, 142 signal processing, 56-67 signed arithmetic, 17 shift right, 48 value, 193 sixteen bit, 16, 32 assignment, 35 load, 31, 124 signed, 31, 125 multiply, 36, 146 multiply accumulate, 56, 143, 144 sign extension, 40, 167 store, 31, 147 sleep, 22, 23, 85, 193 slot, 16, 193 smacinit, 56, 57, 143 smacloop, 56, 144 initialize, 143 smul, 36, 146 ssinc, 31, 147 stack, 193 anti-rotate, 30 duplicate, 30, 103 evaluation, 19, 21, 193 memory, 19 reverse, 30, 134 rotate, 30, 136 start_next_task, 22, 23 state, 193 status register, 8, 20, 21, 55, 170, 193 clear bits, 148 exception handler start state, 74 restart value, 83 restarted process state, 83 set bits, 149 test, 150 statusclr, 55, 148
202/205
(R)
Index
statusset, 55, 149 statustst, 55, 150 steps multiply accumulate, 57 stl, 31, 151 stnl, 31, 152 stop, 52, 79, 84, 153 trap, 72 store, 31-33 bit, 47, 98 byte, 31, 139 local, 31, 151 non-local, 31, 152 sixteen bit, 31 word, 31, 158 sub, 36, 155 subc, 38, 156 subprogram call, 48-52 subscript, 42 address, 41 evaluation, 42 in instruction definitions, 9 word, 164 subtract, 36, 155 with carry, 38, 156 swap bytes, 48, 157 swap32, 17, 48, 157 swinc, 31, 158 switch statements, 180 system traps, 71, 84
U
umac, 38, 56, 161 unary minus, 38 unchecked arithmetic, 11 undefined, 9 underflow, 22, 36, 37 saturate, 138 unsign, 36, 162 unsigned, 10 greater than, 43, 111 multiply accumulate, 38, 161 order, 133 value, 194 user process, 22 user_mode, 22 UserProcessType, 73, 172
V
values, 16 variable address, 41 allocating work space for, 49 assignment, 35 vector, 194 exception table, 72
W
wait, 84, 163 waiting process, 79, 80, 194 while loop, 45 word, 16, 194 address, 18 aligned, 18, 194 arrays, 43 assignment, 35 load, 126 multiply accumulate, 56 store, 158 subscript, 41, 164 work space, 19, 23, 170, 194 address, 23 adjust, 49, 91 cache, 20, 195 dynamic allocation, 52 general adjust, 109 pointer, 19, 23, 195 Wptr, 19, 20, 23, 195 wsub, 41, 164
T
table of constants loading from, 34 task, 78, 194 descriptor, 79, 194 load, 123 Tdesc, 20, 24, 194 test status bit, 55, 150 threads, 78, 194 timeslice, 45, 79, 81, 159, 194 trap, 72 timeslice_count, 22, 81 timeslice_enable, 22, 81 trap_mode, 22, 23 trapped, 78, 194 traps, 70-77, 194 handlers, 194 true, 16, 43, 172 type conversion, 12 of data, 9 of exception, 72
X
x (multiply), 11 xbword, 40, 165 xor
203/205
(R)
Index
bitwise, 11, 47, 166 xsword, 40, 167
204/205
(R)
Information furnished is believed to be accurate and reliable. However, SGS-THOMSON Microelectronics assumes no responsibility for the consequences of use of such information nor for any infringement of patents or other rights of third parties which may result from its use. No license is granted by implication or otherwise under any patent or patent rights of SGS-THOMSON Microelectronics. Specifications mentioned in this pub lication are subject to change without notice. This publication supersedes and replaces all information previously supplied. SGS-THOMSON Microelectronics products are not authorized for use as critical components in life support devices or systems without express written approval of SGS-THOMSON Microelectronics. (c) 1997 SGS-THOMSON Microelectronics - All Rights Reserved IMS and DS-Link(R) are trademarks of SGS-THOMSON Microelectronics Limited. is a registered trademark of the SGS-THOMSON Microelectronics Group.
(R)
Document number: 72-TRN-274-01
205/205
(R)

▲Up To Search▲

Price & Availability of ST20-C1

	To Download ST20-C1 Datasheet File
If you can't view the Datasheet, Please click here to try to view without PDF Reader .